Evolutionary force in confamiliar marine vertebrates of different temperature realms: adaptive trends in zoarcid fish transcriptomes

Background Studies of temperature-induced adaptation on the basis of genomic sequence data were mainly done in extremophiles. Although the general hypothesis of an increased molecular flexibility in the cold is widely accepted, the results of thermal adaptation are still difficult to detect at proteomic down to the genomic sequence level. Approaches towards a more detailed picture emerge with the advent of new sequencing technologies. Only small changes in primary protein structure have been shown to modify kinetic and thermal properties of enzymes, but likewise for interspecies comparisons a high genetic identity is still essential to specify common principles. The present study uses comprehensive transcriptomic sequence information to uncover general patterns of thermal adaptation on the RNA as well as protein primary structure. Results By comparing orthologous sequences of two closely related zoarcid fish inhabiting different latitudinal zones (Antarctica: Pachycara brachycephalum, temperate zone: Zoarces viviparus) we were able to detect significant differences in the codon usage. In the cold-adapted species a lower GC content in the wobble position prevailed for preserved amino acids. We were able to estimate 40-60% coverage of the functions represented within the two compared zoarcid cDNA-libraries on the basis of a reference genome of the phylogenetically closely related fish Gasterosteus aculeatus. A distinct pattern of amino acid substitutions could be identified for the non-synonymous codon exchanges, with a remarkable surplus of serine and reduction of glutamic acid and asparagine for the Antarctic species. Conclusion Based on the differences between orthologous sequences from confamiliar species, distinguished mainly by the temperature regimes of their habitats, we hypothesize that temperature leaves a signature on the composition of biological macromolecules (RNA, proteins) with implications for the transcription and translation level. As the observed pattern of amino acid substitutions only partly support the flexibility hypothesis further evolutionary forces may be effective at the global transcriptome level.


Background
Marine ectotherms specialize to different thermal windows and undergo genomic changes under evolutionary forces seen as adaptation to environmental conditions. Temperature is a crucial abiotic factor causing seasonal variability in ecosystems and mainly distinguishing polar from temperate habitats, thereby determining the distribution of species on large scales [1]. Various molecular responses are described for seasonal temperature changes that range from alterations in the transcript amounts, enzyme activities and the resulting shifts in the energy-and housekeeping metabolism (for review see [2]). Furthermore, regulative mechanisms become effective in ectotherms, like the induction of specific heat shock proteins helping to adjust the metabolic functioning to seasonal warming [3]. In the cold, for example, glycoproteins are expressed to prevent freezing in subzero waters [4].
On longer time scales adaptation to temperature requires an effective housekeeping on the metabolic level (enzyme kinetics) as well as on the transcription level. An increase of protein numbers can only partly compensate for a loss in activity during seasonal acclimatization or permanent adaptation to cold, limited by constraints in cellular space and energy availability [5]. Hence, structural modifications at the protein level are necessary to improve enzyme operation in the cold. Different temperature adaptations and dependencies of the kinetics of single enzymes like the phosphoglycerate kinase [6,7], chitobiase [7] or the lactate dehydrogenase [8] were studied intensively complemented by comparative analyses at the amino acid sequence level [9], discussing thermal acclimation up to adaptation at the molecular level.
These studies described common principles of thermal adaptation in a limited number of enzymes and their three-dimensional structures. The flexibility hypothesis implies that thermal adaptation of enzymes is accompanied by shifts in catalytic turnover numbers (k cat ) and catalytic efficiency (k cat /K m ). For cold adaptation this demands a higher structural flexibility of the protein sometimes at the expense of reduced thermal stability [5,8,10,11]. For maintaining a certain degree of flexibility cold-adapted enzymes tend to have fewer salt links, less interactions within the hydrophobic core, a reduction in the number of proline and arginine residues, a reduction in the hydrophobicity of the enzyme, and improved solvent interactions with a hydrophilic surface via additional charged side chains. In most cases, the catalytic and binding centres are not changed, but mutations under adaptive pressure of temperature are changing the stability, the barriers of activation energy as well as the accessibility of the catalytic cleft through conformational changes in the structure [12].
Research on single enzymes demonstrated that several protein families adapt with different strategies, which even contradict with other observed patterns to some extent. After all, single case studies are not sufficient to identify global adaptive patterns in sequences at the DNA and protein level due to environmental temperature.
Recent genomic and proteomic studies on distantly related extremophiles display coherence between the optimal growth temperature (OGT) and the composition of biological macromolecules (DNA, RNA and proteins), in which higher GC contents prevail on a large scale in warm-adapted species [12][13][14][15][16]. Moreover, similar GCtrends were detected in eukaryotic synonymous sequences, when comparing poikilotherm and homeotherm species [15][16][17]. Subsequently a modified amino acid composition [13][14][15]17] of proteins can be observed, e.g., for nonsynonymous mutations. In the present study we aimed to analyze evolutionary trends in marine ectotherm vertebrates of the same family based on transcriptomic data obtained from two normalized cDNA libraries. Since 454 sequencing-techniques are developed and improved, a number of transcriptomic libraries of various species were published, giving insights in the molecular networks comprising various developmental stages and tissue types [18,19]. The growing availability of sequence libraries allows for more specific inter-comparisons even in nonmodel organisms, e.g., by linking sequence data in closely related species in a pairwise manner.
The family of Zoarcidae (eelpouts) comprises 284 described species, distributed all over the global oceans [20]. Fishes of this cosmopolitan family are therefore appropriate to address the degree of adaptation to temperature in different realms by comparative methods. The eurythermal species Zoarces viviparus lives in the northern hemisphere and faces seasonal shifts of temperature ranging from 0°C in winter to above 20°C in summer [21]. Despite, the maximal growth rate for this species was determined at about 15°C, indicating an optimal ecological and physiological performance at this temperature [22]. Kristiansson and colleagues sequenced a transcriptomic cDNA library of Z. viviparus under unstressed conditions [23]. Furthermore, this species is often compared in ecophysiological studies with its stenothermal Antarctic congener Pachycara brachycephalum [24] that is highly adapted to constant cold waters around 0°C and an optimal growth temperature at 4°C under laboratory conditions [25].
We constructed a transcriptomic cDNA library from liver and heart muscle tissue of the Antarctic eelpout P. brachycephalum under habitat conditions and compared it with a cDNA-library of Z. viviparus at functional level, their amino acid sequence translations and at RNA codon level to uncover differentiation, which we can explain at best by thermal adaptation.

Results
A classification of P. brachycephalum and Z. viviparus by molecular analyses of selected sequence data was used to determine the phylogenetic relationship with various bony fish species from widespread geographical regions. For two mitochondrial genes (ribosomal 16S RNA (16SrRNA) and cytochrome-c-oxidase subunit I (COI)) we computed phylogenetic trees of 13 selected species belonging to the class of Actinopterygii and the elasmobranch Squalus acanthias as outgroup. Both zoarcids formed a clade ( Figure 1A, B). The threespined stickleback Gasterosteus aculeatus appeared as their closest relative for both genes. The obtained phylogenetic trees mirror the grouping into the different families like Cyprinidae, Gadidae, Salmonidae, Tetraodontidae and other Percomorpha. Quantitatively, higher similarities across various species were found for the 16SrRNA, i.e., smaller distances, than for the COI gene. From both trees a close relationship could be inferred among the zoarcid species, although COI is a functional gene and the 16SrRNA is a structural RNA. We expect similar differences to be present also in sequences of functional genes. In this context a manual pre-selection of a limited subset of sequences would bias a comparative study. To this end a more complete approach is followed here based on all available transcriptomic sequences of the closely-related zoarcid fish.

Overall statistical comparison of zoarcid cDNA libraries
The setup comprises 454 sequencing data of two cDNA libraries, one prepared from liver and heart of P. brachycephalum, and one prepared from liver of Z. viviparus [23], generated with similar techniques and protocols (c.f. Material and Methods). The average read length in the library of P. brachycephalum was about 100 bases longer than that for Z. viviparus (Table 1), due to the usage of Titaniumchemistry for sequencing. Subsequently, a higher yield of longer contigs with 487 bases in average for P. brachycephalum was found, compared to 342 bases in average per contig for Z. viviparus (Figure 2A). Other basic parameters like number of reads per contig ( Figure 2B) as well as the average coverage were comparable ( Figure 2C), constituting a suitable basis for alignment analyses of contigs.
Using the BLAST2GO tool [26] for evaluating annotations of contigs, statistics of expect values (e-values) show qualitatively comparable profiles, But quantitatively, sequences of P. brachycephalum show higher similarity to known sequences ( Figure 3A) accompanied by higher coverage per hit of high-scoring segment pairs (HSP) ( Figure 3B). For the purposes of this study we filtered by BLAST [27] for sequences of known functions, i.e., showing known translations, resulting in 19,460 contigs for P. brachycephalum and 16,315 for Z. viviparus.
Functional coverage compared to a phylogenetically related reference genome The overall functional repertoires of the cDNA library sets were evaluated in terms of KOG/COG categories (clusters of orthologous groups) as obtained from the local alignment tool rpstBLASTn applied on the BLAST-annotated sequences against a database of metazoan-specific orthologies (meNOGs) [28,29]. As a scale for the comparison, coding sequences from the closely related species G. aculeatus ( Figure 1A, B) were evaluated accordingly. From the reference genomic data (ENSEMBLE 61) 27,628 coding sequences were extracted to estimate the functional coverage of zoarcid libraries. To compare the appearance of functional terms quantitatively, the counts were pooled into superordinated categories of KOG/COGs ( Figure 4). The count data of both zoarcid transcriptomes were in a comparable range but covered only a fraction of the reference genome. Strongest inter-zoarcid differences were seen for categories F (nucleotide transport and metabolism) and V (defense mechanisms), showing higher counts in P. brachycephalum. The overall coverage compared to the scales of  Figure 1 Phylogenetic trees of representative fish species inhabiting divergent temperature realms. A: Sequence similarity based on the COI barcoding gene. B: Phylogenetic map based on the sequence information of the structural 16SrRNA sequence. Both trees were generated by using the neighbour joining method. Node distances are calculated with Tamura-Nei algorithm. Numbers indicate relative nucleotide substitutions in which different substitution pathways occur with independent probabilities. Sequence gaps were distributed proportional. A summary of the used sequence information is summarized in material and methods Table 3. Overall statistical analyses of the sequencing runs of Z. viviparus data of the library generated by Kristiansson et al. 2009, in comparison to the library of the Antarctic eelpout P. brachycephalum. The resulting reads were set up to contigs with similar assembly techniques. Further analyses of the contig quality could be seen in Figure 2.
G. aculeatus genome is estimated to be 40% to 60%, with a pronounced lesser coverage in categories U, V, D and B. Furthermore, a notable deviation is found in category N (cell motility), where both eelpouts show more counts than are present in the reference genome. Contrary, in category W (extracellular structures) Z. viviparus have more counts compared to the reference set, whereas the number of counts for P. brachycephalum nearly equals it.
To identify species-specific functional patterns in gene expression, we searched for overrepresented meNOGs in each library set. The most exclusive terms in Z. viviparus are meNOG13752 coding for complement protein, meNOG11371 encoding for the cytochrome-c-oxidase (COI) and meNOG18150 for extracellular matrix constituent lubricant protein contributing to the structural extracellular matrix integrity. In P. brachycephalum the most specific term is a b-box-type zinc finger (meNOG24554) followed by meNOG15201 encoding for a tripartite motif protein and meNOG06310 which encodes a carboxylase function. Further terms in this species were ubiquitin (meNOG05057) and the heavy chain of myosin (meNOG 07153) (data not shown). The term for myosin (meNOG07153) is noticed as specific for P. brachycephalum consistent with the fact that we included heart RNA in the library construction.

Amino acid usage
Translations of coding sequence segments were generated based on fish-specific orthologies (fiNOGs) via rpstBLASTn for both cDNA libraries to exclude effects of sequencing  artefacts and frame shifts on the subsequent analyses. From the identified orthologous sequences in each zoarcid fish, matching sequence segment pairs were obtained by harvesting best BLASTp hits per model orthology (i.e. fiNOG), providing a set of shared orthologous sequences in both fish. For a more stringent pairing, i.e., excluding probably false pairs, translated segment pairs were filtered for ≥ 80% amino acid sequence identity. Accordingly, frequencies of amino acids were computed from 4,155 translated and realigned segment pairs, including non-identical amino acid positions, but excluding indels (insertions/deletions). Alterations in amino acid profiles became visible by assessing the total changes in the aligned non-synonymous codon positions ( Figure 5). For P. brachycephalum net losses for glutamic acid and asparagine as well as a net gain of serine were most obviously.
To resolve the exchange patterns in more detail, a table of amino acid replacement counts was transformed to a measure of imbalance of amino acid replacements in P. brachycephalum compared to Z. viviparus (see Methods and Table 2). Similarly, the most striking difference is the loss of 85 glutamic acid in P. brachycephalum with a preferential usage of aspartic acid instead. Further patterns of prominent exchanges are the replacement of 31 leucine by alanine, likewise on a similar level the exchange of 30 glutamic acid residues by glutamine. Serine is used preferentially for almost all amino acids but only replaced by arginine (30 cases). The observed patterns for threonine are not only the preferential usage instead of 82 alanine and 41 asparagine but also replacements for 46 methionine and 31 serine. Beside an overall net loss, valine is preferred over alanine 51 times but also replaced by 65 isoleucine.
The observed imbalances were furthermore reanalyzed taking the ERK-proxy into consideration [30] to determine whether species-specific shifts are detectable within the observed pattern of differentially used amino acids. A significant preference for using E+R+K in Z. viviparus was detected when compared to the orthologous associate in P. brachycephalum (Wilcoxon test, p = 0.02), analysing the 4,155 translated sequences and taking only exchanged positions into account.

Codon usage
The set of 4,155 accordingly re-aligned orthologous coding sequence pairs in the transcriptomic eelpoutlibraries were analyzed by means of a Within Canonical Analysis (WCA) to distinguish synonymous codon usage between the two species [31]. Comparing the composition of the triplets coding for synonymous amino acids, a shift of the P. brachycephalum transcriptome sequence segments became visible in the direction of a preferred usage of A or T in the third position of codons ( Figure 6). The small shift between the blue covariance ellipse relative to the red one is exclusively due to the first principal component, reflecting the G/C-content on the third codon position (GC3). Furthermore, with codon usage differences from 4,155 sequence segment pairs (filtered for > 80% identical translated positions, depicted in   in the left and right panel of this graph reveals a p-value of 5*10 -4 , indicating a significantly non-random distribution of shifts. The overall number of aligned amino acid residue pairs in this set was 414,978, constituting an average of 100 aligned codon pairs per segment with a mean of 96 conserved residues. Comparing 402,219 synonymous codon pairs with conserved nucleotides in positions 1 and 2, P. brachycephalum shows a net GC3-loss of 1,030, re-confirming an imbalance towards a preferential A/T in the third position of the codons in this coldadapted fish. The analysed coding sequence segments showed a mean GC-fraction of 53.196% for Z. viviparus and 53.094% for P. brachycephalum mirroring the observed pattern on the third codon position.

Discussion
In the present study we profiled transcriptomic libraries of two confamiliar fish species from different temperature realms for signatures of thermal adaptation at the sequence level. The phylogenetic analyses ( Figure 1A, B) indicate a close relationship between P. brachycephalum and Z. viviparus, which is supported by transcriptomic sequence comparisons finding a pool of 4,155 comparable sequence segments with a mean identity of 96%. As only minor sequence changes are sufficient to adapt kinetic properties of orthologous proteins, this tight relationship is essential for identifying a potential thermal signature at the genomic and transcriptomic level in species from thermally distinct habitats.
Habitat temperature may be an important trigger for the observed patterns. As Z. viviparus faces similar cold temperatures in winter like P. brachycephalum, a signature for thermal shifts may appear. As optimal growth performance constitutes an important marker for maximal protein synthesis, which was determined for the upper third of the temperature window [22,25], we postulate that the transcriptome is optimized to this temperature range and not to the cold edge, where metabolic activity and performance of the eurythermal species is comparably low.
It is important to note the assumptions we made to argue that the observed changes are adaptive. Garland and Adolph [32] generally argued against comparisons of only two species for making appropriate conclusions about adaptation, since the probability for finding a difference for any trait between two species is always 50% and no valid null hypothesis can be made. According to their critical statement [32], the limitation could be overcome by analysing several traits (about 5 independent) as this will reduce the type I error rate to the excepted rate (p < 0.05). As we used several thousand mRNA/protein sequences for the analyses from the beginning, we can assume that there are still enough independent traits left to reach a p < 0.05. This is essentially true for the analyses of the synonymous codon exchanges, which have no impact on function and can be stated as "neutral" at best. For the non-synonymous codon exchanges we can postulate the same, as the exchange of one specific amino acid in one protein will not cause a definite amino acid in another protein, even if both proteins belong to the same trait. If only a genetic drift has taken place after separation of the two species from the common ancestor, no change in codon or amino acid usage would be expected when analyzing thousands of traits. Since we found a clear signal, we have to conclude that this change is adaptive. The question remaining to be answered is whether the driving force is temperature (alone) or a combination with further factors (see below).
Codon and amino acid usage were found highly correlated with expression levels at least in fast growing bacteria and yeast [33]. Specific needs of an organism may shape a specific codon-usage-profile with a concomitant optimization in the translation apparatus (tRNA amount, stability of secondary structures of RNA, amino acid levels, etc.). Here, we compared qualitatively the profiles of two (related) species. The normalized cDNA libraries were synthesized with the same protocol by the same laboratory but sequenced with different procedures. Nevertheless, both libraries proved to be comparable on the level of transcript diversity ( Figure 2) as well as qualitatively on the level of attributable functions (c.f. Figure 3). Comparable subsets of transcriptomic sequences comprising reliable sequence information for alignments on amino acid and RNA level were generated based on the BLAST results, covering 40 to 60% of functions seen in a related reference genome. In summary, shifts in codon usage between the species seem not biased by expression level per se but possibly by evolutionary forces requiring specific expression levels in a certain (thermal) environment. This aspect is not further addressed within this study.

Functional coverage compared to a reference genome
It is noteworthy that translated coding sequences of G. aculeatus are part of the eggNOG orthologies. Therefore, we expect that a comparison in a subset of these orthologies, i.e., metazoan (meNOGs), is useful to serve as a scale for inter-comparisons of the present cDNA libraries. Nevertheless, comparisons with G. aculeatus genomic coding sequences might show an overestimation of coverage due to the unresolved multiplicities by splice variants present in the assemblies of the    transcriptomes. This fact might explain the observed higher coverage of functional terms in the KOG/COG categories N (cell motility) and W (extracellular structures) of the zoarcid fish compared to the reference genome.
Screening for occurrence of library-specific terms in principle allows for detection of signatures of the environmental conditions under which the transcriptomes were obtained, and not necessarily indicates presence or absence of genes in one of the species genomes. As a result transcripts related to COI were detected in elevated proportion in Z. viviparus, which is against the expectation that the normalization step in library preparation should reduce over-abundant RNA-variants in both libraries to a comparable degree. The same holds for the Antarctic species, for which the frequency of ubiquitin annotation appears in a higher rank. Discovering library specific functional terms, which would be expected to be included in similar ranks in both transcriptomes, reflects differences in the efficiency of normalization.
Other terms, specific for the cDNA library of Z. viviparus are related to complement protein (meNOG13752) and extracellular matrix constituent protein (meNOG18150). These terms summarize functions for immune response and integrity of the tissue, comprising cell communication, growth factors and wound healing. The term for tripartite motif (TRIM) proteins, which is predominantly found in the P. brachycephalum library, summarizes sequences of a protein family associated to pathogen-recognition, regulation of the concomitant transcriptional responses, constituting an important part of the immune system [34]. These differences may indicate adaptive forces acting on the immune system or simply different life histories of the sampled specimens. Responses of the innate immune system in fish were apparently more robust and diverse than in higher vertebrates [35]. Similarly, e.g., Atlantic cod comprises a unique immune system with substantial gene losses in adaptive components without being exceptionally susceptible to disease under natural conditions [36]. Consequently specific adaptations in the immune systems in the two eelpout species due to the different habitat conditions seem possible.
Another overrepresented term in P. brachycephalum is a carboxylase, which belongs to the KOG/COG category I -lipid transport and metabolism, as listed in the meNOG tables. The lipid fraction in liver of P. brachycephalum kept at 0°C is threefold higher than in Z. viviparus at habitat temperature [37], in line with the general finding that Antarctic species have high capacities for catabolism of fatty acids [38]. Furthermore, a recent study demonstrated relationships between preferred energy fuels and transcript levels of respective genes in P. brachycephalum [39]. Although the overrepresentation of the carboxylase fits into the current picture, further genes could have been expected to occur substantiating this evidence. This mismatch may diminish when the entire genome sequence data will become available. In summary, some of the library specific functions identified in our study may point to transcriptomic and possibly genomic contrasts due to the different environmental conditions of the discrete habitats of the two zoarcid fish under study.

Amino acid usage
With the advent of more genomic data from species from various habitats analyses of temperature effects on amino acid usage profiles and the correlated GC content became possible [13,40]. Gu and Hilser [41] analyzed intra-protein interaction energies on the level of primary and secondary structural sequence segments in thermophilic and psychrophilic proteomes, supporting the flexibility hypothesis of cold adaptation. Furthermore, local and global adaptation patterns were differentiated, depending on different protein families.
By comparing the amino acid composition of a comprehensive set of orthologous sequence pairs globally we discovered a distinct pattern of replacements, which may result in structurally important changes in the functional structure of proteins supporting cold adaptation of primary protein structures ( Figure 5, Table 2). In the following section all observed changes of amino acid usage are discussed in respect of the Antarctic species when compared to the Z. viviparus transcriptome.
The most obvious shifts found for P. brachycephalum are the net loss of glutamic acid and asparagine and the gain of serine (compare values of Table 2 and Figure 5). The frequency of all acidic residues is reduced and contrasts with slightly higher frequencies for the basic amino acids lysine and arginine as well as histidine. Due to the eminent loss of glutamic acid, the total amount of charged residues for stabilizing salt bridges is reduced. This may contribute to weaker interaction involved in substrate binding and protein interaction. The net loss of glutamic acid and a preferential exchange with 85 aspartic acid residues preserves the acidic function at reduced flexibility of the side chain. These exchanges may cause increments in charge density at the surface favouring increased solvent interaction as postulated earlier [11,42]. A reduction of salt links results from the exchange of the charged glutamic acid by polar glutamine (30 Q over E), in line with earlier assumptions drawn from analyses of individual proteins [11,42]. Arginine as a bulky amino acid was identified to be less abundant in coldadapted species [11,42]. However, replacements of 30 serine into arginine and a slight increase of arginine on a global level contrasts with both former assumptions.
Based on genomic comparisons Wang and Lercher [30] developed a simple, reduced predictor based on replacement frequencies of the charged amino acids glutamine (E), arginine (R) and lysine (K), which were found most often exchanged in a thermal cline. Higher values for the ERK-proxy are found to be characteristic for hyperthermophiles compared to thermophiles and mesophiles. However, the ERK signal in our study is purely based on the large E change (Figure 5), as a minor change of R+K in the reverse direction does not confirm the ERK-hypothesis perfectly. We interpret this partial contrasting outcome as a consequence of extrapolation, i.e., application of a thermophile-mesophile hypothesis below the mesophile scope seems to be appropriate only for the E signal. Furthermore, we are aware of the finding of Lobry and Necsulea [40] that thermophile-mesophile signatures of cold adaptation are more prominent [43] than mesophile-psychrophile signatures, at least in analyses of codon-usage trends in coding sequences of complete genomes.
Within the group of polar amino acids the picture is heterogeneous due to the large gain of serine and the net loss of asparagine. Comparing the total counts of gains and losses of polar with unpolar residues a gain of 102 polar residues remains. This indicates that despite this pooling unpolar to polar shift -supporting increased solvent interaction-cannot be resolved as a strong signal within our study.
By comparing a limited set of prokaryotic proteome sequences amino acid frequencies canonically discriminating psychropilic, mesophilic and thermophilic species were determined after modelling of candidate genes on existing three-dimensional structures [43]. In this way the analysis of particular orthologuous sequences were extended to part of the proteomes, and amino acid frequencies were assessed separately in structural categories 'buried' and 'surface'. In general, a trend towards polar residues (in particular serine) resemble the finding of a (solely significant) preferential usage of serine and a net gain of polar residues for a cold-adapted species. Our study confirms these findings as a cold induced positive serine shift is one of the dominating signals in our study. Therefore an increased protein surfacesolvent interaction in cold-adapted proteins can be hypothesized, at least on the basis of the net serine signal. However, for a confirmation, further sequence analyses are required to gain structural discrimination of solvent exposure. A diverse pattern is detectable for threonine, accounting to a small net gain similar to serine: the surplus of 82 exchanges over alanine significantly increases the polarity, possibly contributing to less hydrophobic interactions within and improved solvent interactions at the surface of proteins. Similarly, the net gain of threonine over asparagine contributes to the overall reduction of the latter and may support better solvent interaction through increased polarity. A surplus of 30 serine exchanges for threonine was detected. At preserved polarity this exchange is likely to increase the accessibility of the hydroxyl group. In contrast, methionine is preferred over threonine (46 M over T), decreasing the polarity. In summary, the multitude and diversity of exchanges within the polar category of amino acids together with the strong signal found in an earlier study [43] point to their importance for thermal adaptation of proteins.
Within the group of unpolar residues no prominent net gains or losses were detected ( Figure 5). It should be noted that a moderate gain in the usage of proline for P. brachycephalum is detectable, contrasting conclusions of existing views [11,42] as this amino acid causes a large negative impact on structural flexibility. Several amino acid exchanges with conserved functions became apparent in the hydrophobic domain: net gains of 31 alanine in exchange for leucine reflect the preference of shorter residues, i.e., reducing entropic and enthalpic net contributions of side chains to structural stability of protein cores [8,10,11]. The finding of 51 valine over alanine and 65 isoleucine over valine contrast with the global pattern of reduced hydrophobic interaction. However, it is likely that an enlargement in unpolar residue length may enhance the protein-lipid interaction, e.g., within membranes, and must not be in contradiction with the flexibility hypothesis.
As only minor changes in sequence seem to be necessary to adapt kinetic properties, adaptive changes may be difficult to be identified unless proteins of closely related species from different thermal habitats are compared [5,8]. Furthermore, studies based on structural attribution are restricted to proteins with available 3D structures, which is possibly biased towards soluble proteins with enzymatic function as these proteins are overrepresented in common databases. The approach of our study considers all types of expressed proteins within the transcriptome including membrane proteins, structural proteins, etc. without further assumptions.
We claim that our study is less distorted by a probable bias from a specific selection of species, in that we restrict our analyses on model orthologous sequences from two comparable transcriptomes. Furthermore, we focussed on aligned segments, excluding insertions, i.e., we hypothesize that adaptation can be analysed on a purely local basis at single amino acid positions.
In summary, the observed amino acid substitutions represent a global net pattern of molecular shifts in the local, i. e., position-specific amino acid composition for two species of the same family, inhabiting thermally very distinct sites.

Codon usage
A holistic approach for studying temperature-dependent evolutionary profiles on genetic up to proteomic levels was initially subjected to archaea, bacteria and only some eukaryotes on a large thermal scale [14,44], uncovering trends of a higher GC usage for species with a higher OGT. As the GC content reflects the degree of hydrogen bonding in nucleic acid chain molecules, higher GC contents lead to an increased thermal stability in (deoxy-) ribonuclein acid chains [45,46]. Thermal adaptation was analyzed in structural ribosomal RNAs for prokaryotic 16SrRNA finding rising GC contents in species with higher OGTs [47]. In vertebrates similar observations were detected in the 18SrRNA [48] with a trend for higher GC content in endotherm animals compared to poikilotherms showing a correlation to the environmental habitat temperature of the species under study. Similarly, the coding parts of genomes of cold blooded vertebrates and mammals are proposed to be separated by a "major compositional transition" in the GC content, resulting in nearly 100% GC3 levels in mammals [15]. In contrast, genomes of poikilotherm vertebrates are characterized throughout by lower GC3 levels.
Based on the identified segment pairs on the coding sequence level, we focused on a subset of transcript sequences of equal size in both libraries to uncover temperature-related patterns in the codon usage. This filter is supposed to allow for a more specific statistical comparison in that it is not distorted by effects of insertions. We furthermore filtered for highly similar and aligned sequences (80% minimum translated sequence identity) to reduce false pairing. Admittedly, total codon frequencies are on comparable levels for the two species ( Figure 7). However, by aggregating all pairwise comparisons in a single Fisher's test we were able to detect a significant signal (p = 5*10 -4 ) of increased AT3 for the Antarctic eelpout (Figure 7). A set of biochemically meaningful hypotheses may be proposed to explain the latter findings. Firstly, under energetically constrained conditions such as in polar environments, proof reading of DNA might be too cost-inefficient and an increment of A/T in the last codon position encoding for synonymous amino acids might be advantageous without having any impact on protein function. A subsequently lowered GC content is therefore discussed for the cold adapted diatom Fragilariopsis cylindrus (Thomas Mock, personal communication). Secondly, translation itself can be repressed kinetically through increased GC in codons, over-stabilizing enzymatic transition states within the chain of reaction steps of translation in the cold. Thirdly, to avoid over-stabilized secondary structure elements in messenger RNA in the cold lowering GC content could be an evolutionary resort. For example, the molecular function of cold shock proteins (CSPs) has been described to prevent mRNA from frozen, i.e., over-stabilized hairpin conformations in the cold [49]. An adaptation to the cold by means of reducing GC in the mRNA would imply a reduced pressure to express CSPs permanently at high levels, with implications for metabolic cost.

Conclusion
So far, many studies have analyzed subsets of orthologous genomic sequences from phylogenetically diverse species to study principles of thermal adaptation on the sequence level. As adjustments in kinetic and thermal properties of enzymes only need subtle changes in primary structure, high genetic identity is pivotal to uncover possible common principles. The present approach includes a comprehensive set of transcriptomic sequence pairs from confamiliar species inhabiting thermally distinct realms differing by about 10°C OGT. Furthermore, the filtering and alignment procedures allow for local analyses on the biomolecular level.
The analysis of synonymous codons uncovers a significant pattern of a higher A/T content for the Antarctic eelpout P. brachycephalum compared to the North Sea species Z. viviparus. This finding supports the view that cold-adaptation at the DNA/mRNA level took place even on a small thermal gradient in line with results from large scale approaches of distantly related psycrophilic up to hyperthermophilic species [13]. The question arose whether general patterns of cold-adaptation can be identified on the primary structure of proteins. The observed dominating position-specific differences in the primary structure of proteins are pointing to differences in the secondary structure and dynamics, affecting stability and kinetics. Our findings may support the flexibility hypothesis of cold-adaptation, but differences to the existing literature became visible. The former studies revealed some trade-offs for optimization of cold-adapted enzymes like less salt-bridges and more charged side chains for improved solvent interactions. Consequently, cold-adaptation of single proteins uses only a subset of possible alterations. For an understanding of general evolutionary trends in primary protein structure, we provide a more global approach resolving differences between species with similar lifestyles but different thermal adaptation. Protein flexibility at different environmental temperatures may be one adaptation goal whereas for the whole transcriptome other requirements may prevail. The present study should motivate subsequent research with suitable species pairs/groups and full genome coding sequence data to uncover further details of sequence adaptation to the cold.

Phylogenetic analyses of fish
Multiple alignments and phylogenetic trees were generated with MacVector software (Version 10.0.2, MacVector Inc.). For tree construction the neighbour joining method was used, because it avoids assumptions of constant divergence rates among sequences. Node distances were calculated using the Tamura-Nei model, assuming that nucleotide substitutions in different pathways occur with independent probabilities. The sequence data of the genes COI and 16SrRNA were retrieved using the taxonomy browser of the NCBI webpage (http://www.ncbi.nlm.nih.gov/). Detailed information of the used sequences are summarized in Table 3.

Animal collection and sample preparation of Pachycara brachycephalum
Specimens of P. brachycephalum were caught with baited traps at the position 62°10.9' S 58°20.8' W during expedition ANTXV/3 with the RV "Polarstern" in 1998. The animals were brought to the Alfred Wegener Institute in Bremerhaven and kept at 0°C in re-circulated seawater at 34 PSU (practical salinity units). The fish were fed ad libitum with Crangon crangon once a week; feeding was terminated exactly one week before sampling.
For sampling, the fish were anaesthetized by exposure to MS222 (0.2 mg*l -1 ) before being killed. The fish (n=9) had a mean body length of 26 ± 0.83 cm (± SEM) and a mean body weight of 69.77 ± 5.78 g. Tissue samples were quickly excised, frozen instantaneously in liquid nitrogen and stored at −80°C until further processing.
Handling and killing of the fish were conducted in line with the recommendations of the American Veterinary Medical Association (AVMA). The work was approved by a competent German authority (Freie Hansestadt Bremen, reference number 522-27-11/02-00(93).

cDNA synthesis, normalization and pyrosequencing of Pachycara brachycephalum
Total RNA was extracted from 20-40 mg tissue with the Qiagen RNeasy kit with a modified protocol (proteinase K digestion after homogenisation) according to the man-ufacturer0s instruction (Qiagen, Hilden, Germany). Quantity and purity of the RNA were determined using the NanoDrop ND 1000 (Peqlab Biotechnologie, Erlangen, Germany). Integrity of the RNA was analysed by capillary electrophoresis (bioanalyser: Agilent, Waldbronn, Germany). Liver-and heart RNA samples of fish were pooled in equal amounts and used for the synthesis of cDNA.
Preparation of a random-primed and normalized cDNA for pyrosequencing was done with 100 μg of pooled total RNA from liver and heart (Vertis, Freising, Germany). Poly(A)+ RNA was prepared and first strand cDNA was amplified with a N6 randomized Primer. 454 sequencing adapter A (5 0 -GCCTCCCTCGCGCCATCAG-3 0 ) and B (5 0 -CTGAGCGGGCTGGCAAGGC-3 0 ) were ligated to the 5' and 3' ends of the cDNA, followed by an amplification for 21 cycles with a proof-reading enzyme. Normalization was carried out by one cycle of denaturation and reassociation of the cDNA, resulting in N1-cDNA. Reassociated ds-cDNA was subtracted from the remaining ss-cDNA by separation with a hydroxylapatite column. The purified (and normalized) ss-cDNA was amplified with 8 PCR cycles.
The resulting cDNA product was loaded to a preparative agarose gel (1.5%) and fragments in a size range of 450 -650 bp were eluted for sequencing. The application was performed at Eurofins-MWG, Germany according to [50] using the GS FLX Titanium-technique (MWG-Biotech AG, Ebersberg, Germany). High-density pico reactions were performed on a half of a sequencing run.
Quality assessment and assembly of cDNA of P. brachycephalum The sequencing of the normalized cDNA library of Pachycara brachycephalum resulted in 481,802 reads List of species used for generating the phylogenetic trees in Figure 1 with the mitochondrial sequences for the COI gene and structural 16SrRNA with the respective accession numbers. For some species both sections were extracted from the whole mitochondrial genome due to the availability of the sequence information on the NCBI browser.
with an average read length of 321 bases ( Table 1). The reads were preprocessed first with a base calling quality control, followed by a polyA-clipping and a screening for remaining sequencing primer adapters in the 5′as well as in the 3′ends that were truncated. The assembly was done with standard settings for stringency and homology by the Mira Assembler Version 2.9.43 (MWG-Biotech AG, Ebersberg, Germany) according to the instructions by MIRA [51,52]. The high quality reads (338,993) were assembled into 65,565 contigs while 123,038 singlets could not be matched against any other reads. The contigs have a mean length of 487 bases and 22,651 sequences were larger than 500 bases (cf. Figure 2A). The larger contig length compared to Zoarces viviparus may be due to different sequencing protocol as we also used the FLX sequencer with a protocol including Titanium chemistry.

Functional annotation and comparisons of zoarcid cDNA libraries
The assembled contigs of Pachycara brachycephalum were loaded into the free software tool BLAST2GO [26] and BLASTed against the NCBI non-redundant database (BLASTx) [53] with an e-value cut-off of 1.0 -3 and a HSP cut-off length of 33 bases. In total 47,584 Sequences had a BLAST hit, while the e-value distribution ( Figure 3A) and the HSP/hit distribution ( Figure 3B) reflects the good quality of the P. brachycephalum library. We processed the library of contigs of Zoarces viviparus in the same way [23] and found in the 53,459 sequences 35,133 with a BLAST result. After the GOmapping step in BLAST2GO we processed both libraries for annotation with no cut-off for the HSP/hit -coverage, resulting in 19,460 sequences for P. brachycephalum and 16,315 sequences for Z. viviparus. The latter sets of sequences were used for evaluating amino acid usage differences and for synonymous/ non-synonymous codon usage comparisons. For this, we generated partial sequence translations by rpstBLASTn against a collection of fish-specific orthologies (protein sequences from fiNOG from eggNOG version 3) [54] with e-value below 10 -9 , which yields 52,729 (P. brachycephalum and 34,733 (Z. viviparus) hits in a total set of 7,374 fiNOGorthologies. Pairwise alignments between sequences of the two libraries in translated form were then obtained from BLASTP of the obtained fiNOG-compatible translations. 5,352 pairwise alignments (best filtered HSP per fiNOG) with percent-identity above 80% and e-value below 10 -9 were obtained. Coding sequences for the aligned translated segments were aligned accordingly to the protein alignments. Removal of sequence pairs containing stop-codons and/or ambiguity nucleotide-codes yielded a set of 4,155 segments for further analyses. GC3 comparisons are based on aligned synonymous codon positions with unchanged nucleotides in the first and in the second codon positions.
Amino acid usage has been analysed from these alignments as well as synonymous codon usage (Within Canonical Analysis), both in R [55] with the packages seqinr and ade4 (after [31]). The re-aligned sequence segments contain pairwise information that can be compared in a position-specific way, allowing to quantify preferred amino acid usage, e.g., by counting amino acid specific replacement frequencies for amino acid i in P. brachycephalum to amino acid j in Z. viviparus at the same position in the alignment. A non-synonymous replacement imbalance is computed from the matrix.
For a functional characterisation of the libraries, rpstBLASTn was used against the database of metazoan orthologies computed from eggNOG-alignments (version 2, [29]). For comparison with a reference genome of a related fish, we analysed the coding sequences from the genome of G. aculeatus (BROADS1, ENSEMBLE Version 1.63 at EMBL) in the same way. Best hits to meNOG orthologies with e-value below 10 -20 were kept, pooling functional annotations into COG/KOG categories.