Skip to main content

Gonadal transcriptomics elucidate patterns of adaptive evolution within marine rockfishes (Sebastes)

Abstract

Background

The genetic mechanisms of speciation and adaptation in the marine environment are not well understood. The rockfish genus Sebastes provides a unique model system for studying adaptive evolution because of the extensive diversity found within this group, which includes morphology, ecology, and a broad range of life spans. Examples of adaptive radiations within marine ecosystems are considered an anomaly due to the absence of geographical barriers and the presence of gene flow. Using marine rockfishes, we identified signatures of natural selection from transcriptomes developed from gonadal tissue of two rockfish species (Sebastes goodei and S. saxicola). We predicted orthologous transcript pairs, and estimated their distributions of nonsynonymous (Ka) and synonymous (Ks) substitution rates.

Results

We identified 144 genes out of 1079 orthologous pairs under positive selection, of which 11 are functionally annotated to reproduction based on gene ontologies (GOs). One orthologous pair of the zona pellucida gene family, which is known for its role in the selection of sperm by oocytes, out of ten was identified to be evolving under positive selection. In addition to our results in the protein coding-regions of transcripts, we found substitution rates in 3’ and 5’ UTRs to be significantly lower than Ks substitution rates implying negative selection in these regions.

Conclusions

We were able to identify a series of candidate genes that are useful for the assessment of the critical genes that diverged and are responsible for the radiation within this genus. Genes associated with longevity hold potential for understanding the molecular mechanisms that have contributed to the radiation within this genus.

Background

Genomic information can increase our understanding of the molecular evolutionary processes that drive speciation [1]. Comparative genomic and transcriptomic studies have provided a framework for understanding how genes and genomic sequences relate to adaptation and phenotypic evolution at the organismal level [2]. Many of these comparative studies [1, 3, 4] identify coding genes that are subject to rapid divergence and positive selection, a process where mutations are advantageous and favored. Either a single mutation or an accumulation of advantageous mutations can contribute to the process of adaptive evolution. The identification of positive selection at the molecular level has been frequently estimated by the calculation of nonsynonymous (Ka) and synonymous (Ks) substitutions, in which a Ka/Ks ratio greater than one is an indication of positive selection and a value less than one is indicative of negative or purifying selection, the purging of deleterious alleles [57]. Genes under positive selection are generally categorized within comparative genomic studies under processes such as biosynthesis, development, metabolic processes, immune function and reproduction [1, 3, 4]. As more genomic and transcriptomic information becomes available, we can reaffirm or redefine which processes are pertinent to the processes of adaptation and speciation.

Identifying the mechanisms of speciation within marine systems has been a daunting and difficult task. Most studies, post Mayr [8], have focused on identifying geographic barriers that would prompt allopatric speciation [9]. However, within marine ecosystems there are limited geographic barriers that would prevent allopatric speciation [10]. This concept suggests a marine-speciation paradox, where incipient species that come into contact frequently would prevent allopatric speciation [11]. Rockfishes (genus Sebastes), an example of adaptive radiations within marine ecosystems, provide an ideal model system for understanding the mechanisms that contribute to the speciation process. The rockfish genus Sebastes provides a unique model system for studying adaptive evolution because of the extensive diversity found within this group, which includes variation in morphology, ecology, and a broad range of life spans [10, 12]. This rapid radiation is supported by multiple studies which demonstrate the diversification of this group from a phylogenetic context [1315]. Ingram [10] showed evidence that rockfish speciation is associated with the divergence of habitat depth and depth-related morphology, which supports that this group of fishes are undergoing ecological speciation along an environmental gradient. Additionally, complex courtship displays and internal fertilization are found within rockfishes, making assortative mating likely [10, 16] and can help us understand how sexual selection is operating within this group.

Divergent sexual selection can facilitate the speciation process via reproductive traits that form a barrier between incipient species and result in reproductive isolation [1719]. Other factors like spawning time, mate recognition, environmental tolerance, and gamete compatibility are thought to contribute to the marine speciation process [20]. Several molecular evolutionary studies have demonstrated that genes associated with reproduction (i.e. genes that encode for gamete recognition proteins) have rapidly diverged between closely related taxa [2123]. Swanson and Vacquier [18] suggested that the rapid divergence within reproductive genes may stem from a single or combination of selective pressures such as sperm competition, sexual selection and sexual conflict. Levitan and Ferrell [24] demonstrated how sperm competition operates within male and female sea urchins (Strongylocentrotus franciscanus) in which mating pairs that had the most common bindin (sperm protein) genotypes had higher reproductive success in the presence of low polyspermy—the fertilization of an egg with multiple sperm. However, when polyspermy levels were high, males and females with unmatched bindin genotypes had the selective advantage. This depicts an “arms race” between sperm and egg proteins, in which sperm competition is a source of directional selection, and egg proteins are also under selective pressures to develop barriers against polyspermy [25]. Although a vast amount of information supports the rapid diversification of reproductive genes, very little is known about the forms of selection operating on these genes [26]. Most studies on gamete evolution within marine systems have been performed with free spawning organisms [21, 24, 2627]. In contrast, marine rockfishes have matrotrophic viviparity, the process where the eggs are fertilized internally and the mother provides nutrition to the developing embryo and the offspring are expelled as larvae [28]. The latter is not a common life history trait in a majority of extant bony fishes. The evolutionary processes of gamete recognition proteins within this group are unknown. However, multiple paternity has been demonstrated within multiple species within Sebastes, including S. goodei [29, 30], which permits the opportunity for selective forces to operate on reproductive proteins (e.g. sperm competition and the prevention of polyspermy).

A prime candidate for understanding reproductive barriers at the molecular level is the zona pellucida (ZP) gene family, which encodes for glycoproteins that create the acellular vitelline envelope around the oocyte [3133]. The function of ZP proteins varies in fishes and includes uptake of nutrition, functional buoyancy [34], protection of the growing oocyte, species-specific binding, and guidance of the sperm to the micropyle [35]. There are at least eight ZP genes in many fish species [36] that belong to three subfamilies: ZPB, ZPC, and ZPAX [28]. The subfamily ZPA is missing from fishes, which may be due to a gene deletion [37] and subfamilies ZPC and ZPB are known to contain gene duplicates [38]. Selection has been tested in ZPC genes in six teleost species, but the results have been inconclusive due to the lack of robustness in the statistical methods used [39]. In this study, we wanted to address more closely the hypothesis that genes in the ZP family may provide a reproductive barrier between closely related species.

Rockfishes (genus Sebastes) are a prime system for understanding adaptive radiations and the mechanisms of speciation within marine systems [40]. Adaptive radiations involve rapid divergence of multiple lineages, which serve as replicates of speciation within a given environment or time frame [40]. Sebastes spp. has been considered an ancient species flock [14], a group of closely related species with a monophyletic origin [13]. The genus arose around 8 mya, contains 22 recognized subgenera [41], and approximately 105 species found worldwide [13]. Aside from being a diverse group of fishes, there is an extensive difference in lifespans within rockfishes; the shortest-lived rockfish species is calico rockfish (S. dalli) at 12 years and the longest-lived rockfish is rougheye (S. aluetianus), which have a maximum lifespan of 205 years [28]. In addition, this genus is composed of species that are morphologically and ecologically divergent [10, 42], with the center of diversity for this group being located in the Northeast Pacific [43]. Though many studies have concentrated on describing species-level variation [1315, 44], very few studies have investigated the genetic mechanisms that have contributed to this radiation [45, 46].

In this study, our aims were to identify and characterize genes subject to positive selection between two marine fish species in Sebastes. We used a comparative transcriptomic approach, in which we characterized and compared transcriptomes generated from gonadal tissues of the two species S. goodei and S. saxicola. We selected S. goodei (chilipepper) [47] and S. saxicola (stripetail) [48] based on the extensive amount of evolutionary time since their most recent common ancestor (estimated to be greater than 6 million years ago [mya] [13], which can give a broader depiction of which functional genes have diverged within this genus. In addition, gonadal tissues were selected for this study to locate highly divergent reproductive genes, which can serve as candidates for investigating positive selection across the entire genus. Our reasoning for the two different sequencing methods (Sanger and 454-pyrosequencing for S. goodei and S. saxicola, respectively) was that each library was prepared with the latest sequencing technology that was available at the time. We annotated the function of expressed genes using gene ontology (GO), and identified signatures of positive selection from estimates of Ka/Ks ratios for ortholog pairs that we annotated between these two species. Genes that were found to be evolving under positive selection were further analyzed in the context of their orthologs in model fishes and ESTs from our earlier study [46]. In addition to identifying selection through analysis of coding regions, we additionally estimated genetic divergence between the two species in untranslated regions (UTRs). Overall, this study was developed to understand how differences at the transcriptomic level contribute to adaptive evolution within this speciose group.

Results

Sequence statistics and annotation

The S. goodei ESTs contained 2370 and 13,824 raw sequences respectively and a mean EST length of 655.9 and 614 bp respectively (Additional file 1). We assembled 6139 unigenes, which were composed of 664 singletons and 630 contigs from ovary tissue, and 2849 singletons, and 1996 contigs from testes tissue. When processed through a second run of cap3 [ 49 ], the 6139 sequences were reduced to 5336 contigs and used for our comparative analyses with S. saxicola.

The S. saxicola ESTs contained a primary assembly of 311,289 reads and 295,114 clean reads. The primary assembly contained 85,431 singletons and 51,310 contigs with 71 % redundancy (Additional file 2). From these 136,741 sequences, a second assembly was processed and contained 41,174 singletons, 14,090 primary contigs, and 23,475 secondary contigs. Only sequences that were assembled into contigs and greater than 300 bp were used for our comparative analyses resulted in 3112 primary contigs and 15,393 secondary contigs were used with a total of 18,505 contigs.

There were 2480 and 8763 sequences from S. goodei and S. saxicola datasets respectively that were annotated. Within the S. goodei and S. saxicola datasets, there were Gene Ontologies (GO) terms within the biological process domain, that belonged to the cellular process, metabolic process, biological process, multicellular organismal process, developmental process, cellular component organization, response to stimulus, localization, signaling, cellular component biogenesis, reproduction, death, growth, cell proliferation, immune system process, and multi-organism process. Most GO terms represented for molecular function pertained to binding, catalytic activity, transcription regulator activity, molecular transducer activity, transporter activity, enzyme regulator activity, structural molecule activity, and electron carrier activity. The majority of GO terms represented for cellular component pertained to the cell, organelle, macromolecular complex, membrane-enclosed lumen, extracellular region, and synapse.

Our annotations of the two (S. goodei and S. saxicola) transcriptomes were relatively similar across the major three divisions (Biological Process, Molecular Function, and Cellular Component) when levels 2 and 3 GO terms were compared. In most GO terms, S. goodei were slightly elevated, with 2480 annotated contigs and for S. saxicola-8763 contigs. Although there were differences between the two sequencing methods, there were similarities in GO categories between the two transcriptomes. In addition, the two datasets showed 16 % (S. saxicola) and 17 % (S. goodei) of GO terms annotated to reproduction and 35 % and 39 % for developmental processes (respectively), which may provide an overview of reproductive processes within ovary tissues. In addition, the dual use of testes and ovary tissues from S. goodei contained similar GO terms between S. saxicola in which these two tissue types may contain similar GO functional traits.

Genes under positive selection

Two hundred and nine ortholog pairs contained a Ka/Ks less than 0.1, 726 pairs were between 0.1–1.0 (Ka/Ks) and 144 pairs that were found greater than one (positive selection; Fig. 1), which amounts to 1079 orthologs in total. Seventy-one of these pairs were annotated with a majority of the sequences that were associated with macromolecule metabolic processes and regulation of biological processes based on the sequence distribution of Gene Ontologies (Table 1). Only a small fraction of the distribution of GOs were associated with reproductive process (11 orthologous pairs) and sexual reproduction (8 orthologous pairs). The average Ka/Ks value was 0.53 (s.d. = 0.62), and the average ortholog alignment length of 361.37 (s.d. = 132.45). There was no enrichment found between these two categories with a False Discovery Rate of (0.05).

Fig. 1
figure1

Plot of (Ka) nonsynonymous vs. (Ks) synonymous substitutions. Blue diamonds indicate values with a Ks < 0.1, whereas red triangles indicate Ks values greater than 0.1 but less than 0.5. The black line suggests neutrality, values above the line are subject to positive selection and values below are subject to purifying selection

Table 1 S. goodei and S. saxicola ortholog pairs that were identified as positive selection

PAML analyses and zona pellucida phylogeny construction

From 11 of the 71 annotated genes found to be under positive selection in our first PAML dataset, the LRTs conducted showed that there was no significant difference between models M7 and M8. From the second dataset, which contained our two rockfish species of interest as well as S. caurinus and S. rastrelliger, only two out of the four were identified to be under adaptive evolution (M8 was significantly different from M7 when using the LRTs). The two genes were FKB12 and TM50a, which contained five and four sites under positive selection respectively. In our third dataset that was composed of five ZP genes from our two rockfish species, Oreochromis niloticus and Oryzias latipes did not demonstrate signatures of positive selection according to our LRTs analysis (Table 2).

Table 2 PAML analyses of candidate and ZP genes with M7 & M8 models

In our construction of the gene family for ZP within rockfishes we first identified 18 and 26 ESTs that contained ZP annotations for S. goodei and S. saxicola respectively. Maximum likelihood (ML) trees were constructed with 143 ungapped a.a. sites (1075 total sites) and 92 ungapped a.a. sites (697 total sites) for the ZPAX and ZPB, and ZPC respectively (Figs. 2 and 3). In our phylogenetic analysis, seven ZPC, 2 ZPB, and one ZPAX homologs were identified (Figs. 2 and 3). Some ESTs were excluded from this analysis (two ZPB fragments), because these fragments did not align with the majority of the remaining sequences, however, they were included in the Ka/Ks analysis. From the PAML analysis, five ZP ortholog groups were compared. This was based on the ortholog groups identified (Sebastes sequences, Oryzias latipes, and Oreochromis niloticus). The Ka/Ks comparison was conducted with ten ZP genes (six ZPC pairs, three ZPB pairs and one ZPAX pair), where only one pair (ZPB homolog) was identified under positive selection (Table 3).

Fig. 2
figure2

ML tree generated for ZPAX and ZPB genes found within S. goodei and S. saxicola with 1000 bootstrap replicates. Additional teleost species were used to construct this phylogeny, and bootstrap values greater than 70 are displayed

Fig. 3
figure3

ML tree generated for ZPC genes found within S. goodei and S. saxicola with 1000 bootstrap replicates. Additional teleost species were used to construct this phylogeny, and bootstrap values greater than 70 are displayed

Table 3 Pairwise Ka/Ks estimates for ZP ortholog pairs

UTR divergence

Based on 1079 pairwise comparisons (orthologous pairs) between the two rockfish species the average Ka was 0.034 (s.d. = 0.053) and an average Ks value of 0.067 (s.d. = 0.077) by using the YN model. The untranslated region (UTR) divergence estimates between the two fishes were based on 311 and 192 pairwise comparisons for 5’ and 3’ UTRs respectively. The 5’ UTR estimates with a Jukes-Cantor correction were 0.026 (s.d. = 0.025) and Ks values (Jukes-Cantor correction) from the corresponding coding sequences was 0.063, (s.d. = 0.068). The 3’ UTR average was 0.023 (s.d. = 0.024) from the 193 corresponding coding sequences and contained an average Ks value of 0.076 (s.d. = 0.089). Overall, the means for the UTRs were statistically less than the means from the Ks values and there were no clear relationships between UTRs and Ks values. In a pair-wise simple t-test and the Wilcoxon rank sum test there were only two comparisons that did not show any mean differences (5’ UTR ends vs. 3’ UTR ends, and 5’ Ks from coding regions vs. 3’ Ks from coding regions - Table 4).

Table 4 Pairwise analyses of sequence divergence

Discussion

This study identifies genes under positive selection between the gonadal transcritomes of two distantly related rockfish species (S. goodei and S. saxicola). 1079 orthologous gene pairs were identified between the two species and of these we found 144 genes under positive selection. Genes found under positive selection did not overlap with the genes found in a previous Sebastes comparative transcriptome study [46], which is not surprising given the differences in the tissue types. However, we did find similar functional traits (metabolism, immune function, and longevity) for genes under selection. In the earlier study, the transcriptomic analysis was conducted with brain, pituitary, kidney, and spleen tissues, which may differ in expression patterns from gonadal tissues. Gene expression among different tissue types is still being teased apart, as genes once thought to be expressed in a tissue specific fashion have been identified in multiple tissues [50]. In addition, the examination of a set of candiate genes from the zona pellucida gene family did not reveal strong signs of positive selction, as has been found in other vertebrates. Lastly, we used divergence estimates of the UTRs to further support that orthologs were identified for our study and not paralogs. As more rockfish tissue-specific transcriptomic information becomes available, the determination of whether certain genes subject to positive selection belong to specific tissues can be determined. This information allows us to better understand how reproductive genes have contributed to the process of adaptive radiation within this group of fishes.

Comparison of the two datasets

The combination of Sanger and 454 sequencing technologies have been beneficial for increasing the amount of transcriptomic information available for non-model species [51]. In rainbow trout (Oncorhynchus mykiss), the combination assembly of Sanger and 454 sequencing showed high similarities with other fish species that have their genomes sequenced [51], which provides support that the combination of the two technologies do not generate disparities or conflicting information. Caveats seen with 454 sequencing is that singletons contained elevated insertions in mycorrhizal fungi [52], and also high errors rates have been found within homopolymer repeats [53]. We did not include singletons in our study and we saw very similar annotations for the two datasets. In addition, we were able to obtain a substantial number of orthologs between the two species datasets (1079 pairs) which suggests that the different sequencing technologies did not hinder our analyses.

Natural selection

Our scan for genes under positive selection also includes genes with elevated Ka values (Ks = 0) that contained GO terms that were associated with adult life spans and gamete function/production. Genes with only nonsynonymous substitutions and assigned with GO terms associated with gamete production/function were the t-complex protein 1 and lissencephaly-1 homolog. Study on zona pellucida – 3 (homologous to ZPC in fishes) and the t-complex protein 1, and immune system protein β 2 m in a group of closely related murine species (genus Mus) contain sites under positive selection [54]. T-complex protein 1 is expressed during spermatogenesis in murids [48], but the specific function is still unknown. This gene is highly expressed within mouse testes and is suggested to maintain normal spermatogenesis. Lissencephaly-1 has been demonstrated to be conserved [55] when compared between mice and humans. This gene has been shown to demonstrate infertility when a homozygous mutant has been developed [56]. The likely scenario for elevated Ka values found in these genes is because these are only fragments of the entire gene sequence. These genes would be interesting to examine at the population level within each respective species in order to determine whether there is variation found at both synonymous and nonsynonymous sites.

Ortholog pairs under positive selection with a Ka/Ks > 1 and GO terms associated with gamete production/function were deadenylating nuclease, DNA ligase III, DNA mismatch repair protein, eukaryotic translation initiation factor 2, and homolog subfamily a member 4 (Table 1). DNA repair mechanisms have a strong relationship with gametogenesis, where the genomes of gametic cells are subject to mutations following recombination [57]. Within these gametic cells the repair mechanisms have to tolerate mutations that occur during gametogenesis which result in specialized functions [57] that are possibly due to selective pressures. Deadenlyating nuclease has been suggested to silence maternal mRNA during oocyte formation [58], this is particularly interesting due to the transcript comparison between S. goodei testes and S. saxicola ovary tissue. Homolog subfamily a member 4 is known to be part of the DnaJ family, which is assigned to the structurally unrelated protein family of Heat Shock Proteins (HSPs) [59]. In humans, this gene is expressed in brain tissue, but many homologs within the family are associated with sperm motility. Recent study has shown there are differences in reproductive genes between infertile vs. fertile human males, in which DnaJ subfamily A was represented [60]. Clearly, these genes need to be further investigated to understand the mechanistic and functional properties within rockfishes to understand how these genes are subjected to positive selection.

Within our scan for positively selected genes we identified genes associated with longevity. Although the two species have similar lifespans, there are extensive differences between life spans across species within the genus [61], and genes associated with longevity were identified within our previous study [46]. The congener closely related to S. goodei is S. paucispinis [13], which can live to at least 46 years [61]. By comparison, the nearest congener to S. saxicola is S. semicinctus, which can live up to 15 years [43]. The genes identified here and associated with longevity were eukaryotic translation initiation factor 2 subunit 3, cytochrome c oxidase subunit 5a, 40s ribosomal protein x isoform, and 60s ribosomal proteins L9 and L17, and protection of telomeres protein 1 [6264]. These genes associated with longevity are particularly interesting and hold the potential key for understanding how aging operates in this group of fishes. As more rockfish genomic information becomes available this will provide a clearer depiction of the patterns of longevity and how this may impact adaptation.

The genes that showed evidence of positive selection in our PAML analysis were 12-kDa FK506-binding protein (FKBP12) and transmembrane protein 50a (TM50a). FKB12 is known to be associated with various cellular functions that include apoptosis, cell-cycle progression, and calcium release [65]. Genes that encode for the mechanisms of apoptosis have been suggested to be under positive selection [2]. Speculation for why these genes are under selection is due to the genomic conflict that would occur as a result of apoptosis during spermatogenesis [66]. As for TM50a, this gene encodes for a membrane protein and the function of this gene within fishes is unknown. There is more information needed to determine how these genes contribute to adaptation within Sebastes.

Other comparative transcriptomic analyses of candidate systems for adaptive radiations, such as crater lake cichlid fishes [1] and East African cichlid fishes [67], showed a limited number of genes found under positive selection, which was less than 1 % and ~ 2.7 % respectively, (both were less than what we found in our study ~ 13.3 %). From these studies on cichlids, some of the genes under positive selection that were comparable to our study were: transmembrane protein, cytochrome c oxidase, lipid phosphate phosphohydrolase, ribosomal proteins from Baldo et al. [67] and RNA-binding from Elmer et al. [1]. These genes would be of interests to investigate further since they are found under positive selection within multiple examples of adaptive radiations, which includes our study.

Currently, there is much debate over the assessment of natural selection at the molecular level. However, one of the limitations to these analyses are that current statistical methods estimate Ka/Ks across an entire gene and does not account for the relaxation of purifying selection, and/or the effects of population bottlenecks [68]. In addition, estimates of Ka/Ks demonstrate a conservative estimate of positive selection, because most of the protein is under a functional constraint and only a few amino acid sites would be subject to positive selection [2]. However, within many comparative genomic studies there are genes that have been identified under positive selection which encode for proteins with immune or reproductive functions [4, 69]. Although there may be difficulties detecting selection, there are reoccurring gene functions that are subject to positive selection. Within our study, we have identified certain genes under positive selection that encompassed a broad range of GO terms where a majority of terms include: cellular process, metabolic process, biological regulation, response to stimulus, multi-cellular organ process, cellular component organization, developmental process, localization, signaling, and reproduction. The specifics about how these genes under positive selection contribute to adaptation within heterogeneous environments remains unknown, but provides a suite of candidates for understanding why these genes have been identified as nonsynonymous substitutions in comparison to the remainder of the transcriptome.

Zona pellucida

Current evidence shows that there are six subfamilies of zona pellucida genes in vertebrates (ZPA/ZP2, ZPB/ZP4, ZPC/ZP3, ZPD, ZPAX, and ZP1) [38] and these are homologous with the ZP domain found within invertebrates [70]. Our phylogenetic construction of the ZP family suggests there is only one ZPAX gene, two (putatively four) ZPB homologs, and seven (ZPC/ZP3) homologs in our dataset. Most ZP genes within the rockfish genome grouped with Oreochromis niloticus and Oryzias latipes, which suggests these genes have arisen in a similar pattern from a recent common ancestor (Figs. 2 and 3).

In our estimation of Ka/Ks of ten ZP gene pairs most pairs contained a broad range of Ka/Ks values (Table 3) with only one ortholog pair that was subject to positive selection (ZPB homolog 1). Both ZPB homologs (1 and 2) were not used to construct phylogenetic trees because these sequences provided limited phylogenetic information (weak bootstrap support) and were shorter than the sequences used for our phylogenetic analyses. However, these genes are divergent from the remaining ZP homologs and an ortholog from one of the model teleost could not be detected. It is unknown if some of these ZP homologs are specific to the Sebastes lineage, where more information from species within this genus and closely related genera or families would be needed to make this assessment. Currently, there is no evidence of teleost ZP genes subject to positive selection [71], however this was assessed with a select few model fishes (i.e. Danio rerio, Oryzias latipes, Gasterosteus aculeatus, Tetraodon nigroviridis, and Takifugu rubripes). This poses the question of whether there is enough evidence to show that ZP genes do not provide evidence for positive selection within teleost or is there some other mechanism that would prompt reproductive barriers? These methods are more stringent at identifying selection and the addition of more taxa from Sebastes can provide insight on how these genes have contributed to the radiation within this group.

UTR analysis

Untranslated regions (UTRs) provide a reference of divergence between species and can be utilized as a base for comparing synonymous substitutions within coding regions that are assumed to be evolving neutrally. Our estimation of 3’ and 5’ UTR divergence is unprecedented within the genus Sebastes. Our estimated values of UTR divergence between S. goodei and S. saxicola were not statically similar to the Ks values (from 5’ and 3’ sequences, Table 4). In addition, the utilization of the cutoff mark (Ks < 0.1) is not an essential benchmark for the removal of aligned pairs as putative paralogs according to our UTR analysis (Fig. 4). Interestingly, the Ks coding region and 3’ UTR divergence between crater lake cichlid fish species contained rates of 0.0250 and 0.0252 (with a Jukes-Cantor correction) respectively which had a common ancestor ~ 10,000 years ago [1, 72]. This provides an interesting comparison of freshwater (cichlids) and marine fishes (rockfishes), where UTR divergence was similar between cichlids and rockfishes but Ks values were different. Hurst [73] suggesteded that synonymous rates are relatively proportional to the neutral mutation rate, which suggests that the UTRs and Ks are relatively close to this rate. However with species that are more divergent, there are distinct differences between synonymous rates and UTR divergence. Divergence within closely related Drosophila species are distinct where 3’ UTR and 5’ UTR rates are lower than synonymous sites when comparing D. melanogaster and D. simulans [74], which diverged ~ 2–3 mya [75]. Our study did not have similar Ks and UTR rates as compared to the Elmer et al. [1] study, which may be due to the amount of time since divergence (estimated 6 mya). Suggestions have been made that lower UTR divergence in comparison to synonymous sites in Drosophila is likely to be subject to negative selection, which is consistent with our findings [74]. This pattern of 3’ UTRs subject to purifying selection has also been identified within chimpanzees and humans [76]. More evidence will be required to demonstrate the impact of negative selection on the marine rockfish genome, which analyzing the UTRs from closely to distantly related congeners can provide insight on this evolutionary pattern.

Fig. 4
figure4

Comparison of UTR divergence with alignment length and Ks divergence. Blue diamonds indicate ortholog pairs with a Ks > 0.1, whereas red triangles indicate Ks values that are greater than 0.1 and less than 0.5

The use of the UTRs has been a useful indicator for assigning the correct ortholog pairs as opposed to paralogs, in addition to the algorithms used in inparanoid [77]. Depending on the function of the gene, UTRs can be highly conserved between orthologs and divergent between paralogs once a gene duplication event has occurred, which has been demonstrated between humans and mice [78]. One of the many difficulties of identifying orthologous gene pairs within teleosts is the proposed fish specific genome duplication (FSGD) event which occurred ~ 350 mya [79]. This event provides a plethora of gene duplicates that may operate under different evolutionary pressures such as subfunctionalization, neofunctionalization, and pseudogenization. With this magnitude of gene duplicates, the assignment of orthologous gene pairs can be difficult because of the amount of duplicates that are closely related. In our study, we showed lower rates of divergence within the UTR region in comparison to the synonymous sites of these two species. If we constructed an alignment of UTRs from a pair of paralogs in which the paralogs arose due to the FSGD, then there would be an expected high degree of divergence as opposed to the divergence rate of true orthologs. However, exceptions may occur with recent gene duplications and/or concerted evolution permits for paralogs to be subjected to similar selective pressures. If we can detect novel genes within this genus we can gain a better perspective of the rate of divergence occurring within the UTR region. Understanding the importance and evolutionary patterns of novel genes is a promising avenue with the advent of next-generation sequencing.

Conclusions

This transcriptomic study between S. goodei and S. saxicola provides a template for understanding evolutionary processes at the molecular level within Sebastes. We identified a series of candidate genes that are useful for the assessment of the critical genes that diverged and are responsible for the radiation within this group. Genes that pertain to longevity hold potential for understanding the molecular mechanisms that have contributed to the radiation within this genus. The establishment of genes under positive selection from this study can be insightful and utilized to assess whether these positively selected genes are under selection across the entire genus Sebastes. If these genes are under positive selection across the entire genus, this will provide new clues about how natural selection is contributing to speciation by reproductive isolation within this group. This study was intended to further advance the field of evolutionary biology by providing support of which functional genes are important for adaptation and sexual selection. With transcriptomic data from multiple species within Sebastes, we can identify the repeated patterns of adaptive evolution and elucidate our understanding of how adaptation and the speciation processes occurred across the entire genus of Sebastes.

Methods

EST sequencing and assemblies: S. goodei

A portion of the ovary and testes were collected from fresh dead S. goodei individuals (one per sex), placed immediately in RNAlater, and stored at −80 °C. The National Oceanic and Atmospheric Administration (NOAA) Fisheries, Southwest Fisheries Science Center, Santa Cruz, California collected samples under a salvage permit. DNA (cDNA) isolation and library construction was performed by BIO S&T (Montreal, Canada). Total RNA was extracted with TRIzol (Invitrogen, Carlsbad, CA), and cDNA was synthesized according to the SMART cDNA library construction kit (Clontech, USA). The resulting cDNAs were full-length enriched, and possess SfiI A&B at the 5’ and 3’ ends which facilitated directional cloning. Double-stranded cDNAs were obtained by primer extension. Double-stranded cDNAs were digested with Sfi-I, afterwards only fragments greater than 0.5 kb were purified with a gel purification kit.

Purified cDNA was ligated to SfiI-digested and Calf intestinal phosphatase (CIPed) vectors by overnight incubation at 16 °C. The ligation mixture was desalted and electroporated in ElectroMax DH10B cells (Gibco-BRL, USA). Quality control (average cDNA insert sizes and recombinant rate) was performed prior to mass transformation. Transformed cells were distributed into 96-deep-well plates for amplification at about 2300 recombinants per well.

Cells were plated onto LB-agar (amplicillin and x-gal) plates. Clones were prepared for sequencing in two ways. Method 1 – positive colonies were picked directly into 96-well plates that contained LB broth + ampicillin. Cultures were grown overnight at 37 °C with moderate shaking. The Montage Plasmid Miniprep HTS kit (Millipore) was used to isolate plasmid DNA. Sequencing on purified plasmid DNA was done with M13 (−20 and +40) primers at JGI, which was conducted in another study [80]. Method 2 – cDNA libraries were produced by double-stranded cDNA, which was size fractionated to obtain long reads. Afterwards, cDNA inserts were cloned into the vector pExpress1 (Express Genomics, Frederick, MD), and electroporated into E. coli strain DH10B. Libraries contained ~ 96 % recombinants with an average insert size of 1.95 kb. Libraries were sequenced on 96-well capillary sequencing platforms (ABI 3700) located at the DOE Joint Genome Institute (JGI, Walnut Creek, CA) and at the Genome Core Facility at the University of California, Merced, CA.

Expressed Sequence Tags (ESTs) were cleaned and assembled with an automated pipeline (EST2uni) [81], which includes base calling (phred), vector trimming and low quality bases removal with lucy [82], and repeat masking with repeatmasker-open 3.0 [83]. Afterwards, the assembly of sequencing reads into unique consensus sequences (unigenes) [81] was conducted with cap3 [49], and functional annotations were conducted with blast [84], in which the hits are then parsed so that a description is listed for each unigene. The unigene datasets are composed of high quality and clean sequences, which are assembled into contigs and singletons [81]. These S. goodei sequences can be found at Genbank with the following accession numbers [Genbank: JZ693907-JZ704944]. Unigenes were processed again with cap3 to correct for putative assembly errors and then used for the comparative transcriptomic analysis against the S. saxicola dataset.

EST sequencing and assemblies: S. saxicola

Ovary tissue was collected from a single fresh dead S. saxicola individual, placed immediately in RNAlater, and stored at −80 °C. The S. saxicola individualwas also collected by NOAA Fisheries, Southwest Fisheries Science Center, Santa Cruz, California under a salvage permit. Complementary DNA (cDNA) isolation and library construction for 454 sequencing was performed by BIO S&T (Montreal, Canada). The library was sequenced at the University of South Carolina Environmental Genomics Core facility on a Roche 454 sequencer. The library was sequenced on a ½ of a titer plate.

The S. saxicola raw reads and base quality information from the 454 GS FLX sequencing run were first extracted and clipped using the sff_extract 0.2.8 [85] script. Further removal of adaptors and contamination, such as low quality bases and poly (A) stretches, was achieved by using snowhite 1.1.4 [86], a pipeline that implements aggressive cleaning with seqclean (http://sourceforge.net/projects/seqclean/) and tagdust 1.12 [87]. Reads were then processed through repeatmasker-Open 3.0 using the cross_match (Downloaded June 2010; [88]) search engine to search the “teleost fish” database and mask repetitive elements. A primary de novo assembly was initially done using the 454 default settings in mira 3.2.0 [89] with a minimum percent identity of 94 %. A secondary assembly was performed on the contigs produced from mira 3.2.0 and all remaining singletons in cap3. A minimum overlap of 25 bp and a minimum %ID of overlap of 95 % was used in the secondary assembly. Finally all contigs less than 300 bp in length were removed before additional analyses.

Annotation of the S. goodei and S. saxicola datasets

Both EST datasets were annotated in blast2go [90] with the following Blast parameters: blastx to the swiss-prot database [91], an E-value of 1.0 x E−6, 20 blast hits, and a High Scoring Pair length cutoff of 33 nt. The annotation parameters were an E-value hit filter of 1.0 x E−6, annotation score cutoff of 55, and a gene ontology (GO) weight of 5. A two-tailed Fisher’s Exact Test was used in blast2go to determine whether there was enrichment of GO terms for the orthologous pairs that contained a Ka/Ks > 0.5 in comparison to orthologous pairs that were conservative (Ka/Ks < 0.1).

Detection of orthologs from the S. goodei and S. saxicola datasets and estimation of selection

blastx (NCBI blast version, 2.2.17) from the standalone blast package [84] was used to identify homologs in both S. goodei and S. saxicola ESTs against the swiss-prot database (downloaded June 2011) with five teleost datasets from fugu (Takifugu rubripes), medaka (Oryzias latipes), green spotted pufferfish (Tetraodon nigroviridis), stickleback (Gasterosteus aculeatus), and zebrafish (Danio rerio) in the Ensembl database [92] (Ensembl 63). Afterwards the blastx reports and the EST sequences were processed through orfpredictor [93], which identifies putative open reading frames and translates nucleotide sequences into protein sequences. The translated protein datasets from S. goodei and S. saxicola were used in inparanoid 4.0 to identify orthologs and avoid the inclusion of paralogs. Danio rerio (Ensembl dataset, Zv9) was used as an outgroup for removing potential false orthologs. Orthologous pairs were aligned based on the putative open reading frame using pal2nal 12.2 [94] and Perl scripts that include clustal w 2.0.10 [95]. Ka and Ks were calculated for the orthologous pairs between S. goodei and S. saxicola in kaks_calculator 1.2 [96] by using the YN model [97].

Ortholog identification and positive selection

We used 5336 and 18,505 contigs from S. goodei and S. saxicola ESTs respectively for the identification of orthologs and the Ka/Ks analyses. There were 1559 orthologs detected with inparanoid 4.0. Once processed through kaks_calculator 1.2, pairs were removed from our analyses if the alignment length was less than 150 bp and/or the Ka/Ks values were greater than 50. Ortholog pairs with a Ks value less than 0.1 were further analyzed, which has been used as a benchmark to avoid inclusion of paralogs [98]. We also included a second set of ortholog pairs with Ks values within the range of 0.1–0.5 (Fig. 5). This allowed us to determine whether the Ks > 0.1 benchmark should be extended for our analyses.

Fig. 5
figure5

Frequency of ortholog pairs with synonymous substitution estimates. The black dotted line indicates the traditional cut off line and the red dotted line indicates our new threshold cut-off

PAML analyses and zona pellucida phylogeny construction

We analyzed three different datasets, in which we tested for adaptive evolution with the paml 4.4 [99] software package. We used codeml which is part of the paml 4.4 package and tested for positive selection with M7 (neutral model) and M8 (selection model) [100] and conducted Likelihood Ratio Tests (LRTs) between the two models. We conducted a tblastx search with additional datasets from Oreochromis niloticus, Oryzias latipes, Sebastes rastrelliger, and Sebastes caurinus to identify orthologs. Only ortholog pairs of length 65 codons or greater, and 85 % identity were utilized for our analysis. The first dataset consisted of orthologs from Oreochromis niloticus (Nile Tilapia), Oryzias latipes (Medaka) and the two focal Sebastes species, which contained eleven genes. Orthologs were identified for genes with elevated Ka/Ks values. These two model species were chosen due to their close relationship to rockfishes when we analyzed our ZP phylogenies. The second dataset included additional orthologs identified from a previous study (S. caurinus and S. rastrelliger; 46) to further validate signatures of adaptive evolution within the genus that contained four gene pairs.

A third dataset contained sequences from the zona pellucida (ZP) gene family with 5 gene pairs. Sequences annotated to this family were used to construct a phylogeny of the ZP gene family and a fine-scaled analysis of positive selection. S. goodei and S. saxicola sequences were trimmed and translated within orf predictor. Based on the annotations (assignment of ZP subfamily), the longest ESTs from the two Sebastes species were used for phylogenetic analyses and the following subfamilies were identified: ZPAX, ZPB, and ZPC. ZP subfamilies (one sequence alignment dataset for ZPAX and ZPB, and another for ZPC) were aligned with mafft 6 (http://mafft.cbrc.jp/alignment/server/) and a ZPA homolog from Xenopus levis was used as an outgroup. These sequences along with teleosts sequences with known ZP annotation [32] and the top tblastx hits from GenBank were translated and aligned in mafft 6 (http://mafft.cbrc.jp/alignment/server/). After alignment, sequences were processed through prottest 3 [101] to determine a model of protein evolution. Phylogenies were constructed with aligned sequences and a selected protein model in PhyML 3.0 [102]. If both a S. goodei and S. saxicola homologous pair were present, they were processed in kaks_calculator 1.2 by using the YN model [97] to estimate positive selection.

UTR divergence

We were interested in the neutral substitutional mutation rate within our transcriptomic datasets. In addition, we expected the UTR regions to be highly divergent only if paralogs were identified in our ortholog search between the two datasets. This will give an indication that our Ks cut-off at 0.5 is valid. We developed scripts, which were used to remove 5’ and 3’ UTRs from the orthologous pairs and conduct a pairwise alignment in muscle 3.7 [103]. Lastly, we estimated sequence divergence using a Jukes-Cantor model as suggested by Elmer et al. [1] only pairs greater than 50 bp were used for our analyses. Only pairs that contained both a 5’ and 3’ estimate were removed to prevent a partial paired analysis and we conducted a pair-wise blast of the orthologs to assess the quality of our alignments. blast scores of 90 bits or greater were included for our divergence analysis. Coding regions were reprocessed through kaks_calculator 1.2 and Ks values were estimated with a Jukes-Cantor correction in order to make comparisons. Simple pairwise t-tests and Wilcoxon Rank Sum Tests were calculated between and within coding regions and UTRs by using r (https://www.r-project.org/).

Availability of supporting data

Raw reads from S. goodei were deposited to dbEST under the accession numbers JZ693907-JZ704944. Short reads from S. saxicola were deposited to the Short Read Archive under the accession SRR1212396.

References

  1. 1.

    Elmer KR, Fan S, Gunter HM, Jones JC, Boekhoff S, Kuraku S, et al. Rapid evolution and selection inferred from transcriptomes of sympatric crater lake cichlid fishes. Mol Ecol. 2010;19:197–211.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Ellegren H. Comparative genomics and the study of evolution by natural selection. Mol Ecol. 2008;17:4586–96.

    Article  PubMed  Google Scholar 

  3. 3.

    Clark AG et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–18.

    Article  CAS  PubMed  Google Scholar 

  4. 4.

    Heger A, Ponting CP. Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. Genome Res. 2007;17:1873–49.

    Article  CAS  Google Scholar 

  5. 5.

    Miyata T, Miyazawa S, Yasunaga T. Two types of amino acid substitutions in protein evolution. J Mol Evol. 1979;12:219–36.

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–26.

    CAS  PubMed  Google Scholar 

  7. 7.

    Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–36.

    CAS  PubMed  Google Scholar 

  8. 8.

    Mayr E. Systematics and the origin of species. New York: Columbia Press; 1942.

    Google Scholar 

  9. 9.

    Taylor MS, Hellberg ME. Marine radiations at small geographical scales: speciation in neotropical reef gobies (Elacatinus). Evolution. 2005;59:374–85.

    PubMed  Google Scholar 

  10. 10.

    Ingram T. Speciation along a depth gradient in a marine adaptive radiation. Proc Roy Soc Lond B Biol Sci. 2011;278:613–8.

    Article  Google Scholar 

  11. 11.

    Bierne N, Bonhomme F, David P. Habitat preference and the marine-speciation paradox. Proc Roy Soc Lond B Biol Sci. 2003;270:1399–406.

    Article  Google Scholar 

  12. 12.

    Mangel M, Kindsvater HK, Bonsall MB. Evolutionary analysis of life span, competition, and adaptive radiation, motivated by the Pacific rockfishes (Sebastes). Evolution. 2007;61:1208–24.

    Article  PubMed  Google Scholar 

  13. 13.

    Hyde JR, Vetter RD. The origin, evolution, and diversification of rockfishes of the genus Sebastes (Cuvier). Mol Phylgenet Evol. 2007;44:790–811.

    CAS  Article  Google Scholar 

  14. 14.

    Johns GC, Avise JC. Tests for ancient species flocks based on molecular phylogenetic appraisals of Sebastes rockfishes and other marine fishes. Evolution. 1998;52:1135–46.

    Article  Google Scholar 

  15. 15.

    Rocha-Olivares A, Kimbrell CA, Eitner BJ, Vetter RD. Evolution of the mitochondrial cytochrome b gene sequence in the species-rich genus Sebastes (Teleostei: Scorpaenidae) and its utility in testing the monophyly of the subgenus Sebastomus. Mol Phylogenet Evol. 1999;11:426–40.

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    Helvey M. First oberservations of courtship behavior in rockfish, genus Sebastes. Copeia. 1982;763–770.

  17. 17.

    Endler JA, Basolo AL. Sensory ecology, receiver biases and sexual selection. Trends Ecol Evol. 1998;13:415–20.

    CAS  Article  PubMed  Google Scholar 

  18. 18.

    Swanson WJ, Vacquier VD. The rapid evolution of reproductive proteins. Nat Rev Genet. 2002;3:137–44.

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Masta SE, Maddison WP. Sexual selection driving diversification in jumping spiders. Proc Natl Acad Sci. 2002;99:4442–7.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  20. 20.

    Palumbi SR. Genetic divergence, reproductive isolation, and marine speciation. Annu Rev Ecol Syst. 1994;25:547–72.

    Article  Google Scholar 

  21. 21.

    Aagaard JE, Vacquier VD, MacCoss MJ, Swanson WJ. ZP domain proteins in the abalone egg coat include a paralog of VERL under positive selection that binds lysin and 18-kDa sperm proteins. Mol Biol Evol. 2010;27:193–203.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  22. 22.

    Lee Y-H, Vacquier VD. Evolution and systematics in Haliotidae (Mollusca: Gastropoda): inferences from DNA sequences of sperm lysin. Mar Biol. 1995;124:267–8.

    CAS  Article  Google Scholar 

  23. 23.

    Turner LM, Hoekstra HE. Adaptive evolution of fertilization proteins within a genus: variation in ZP2 and ZP3 in deer mice (Peromyscus). Mol Biol Evol. 2006;23:1656–69.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Levitan DR, Ferrell DL. Selection on gamete recognition proteins depends on sex, density, and genotype frequency. Science. 2006;312:267–9.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Palumbi SR. Speciation and the evolution of gamete recognition genes: pattern and process. Heredity. 2009;102:66–76.

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Pujolar JM, Pogson GH. Positive Darwinian selection in gamete recognition proteins of Strongylocentrotus sea urchins. Mol Ecol. 2011;20:4968–82.

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Clark NL, Findlay GD, Yi X, MacCoss MJ, Swanson WJ. Duplication and selection on abalone sperm lysin in an allopatric population. Mol Biol Evol. 2007;24:2081–90.

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Love MS, Yoklavich M, Thorsteinson L. The rockfishes of the Northeast Pacific. Berkeley: University of California Press; 2002.

    Google Scholar 

  29. 29.

    Hyde JR, Kimbrell C, Robertson L, Clifford K, Lynn E, Vetter R. Multiple paternity and maintenance of genetic diversity in the live-bearing rockfishes Sebastes spp. Mar Ecol Prog Ser. 2008;357:245–53.

    Article  Google Scholar 

  30. 30.

    Sogard SM, Gilbert-Horvath E, Anderson EC, Fisher R, Berkeley SA, Garza JC. Multiple paternity in viviparous kelp rockfish Sebastes atrovirens. Environ Biol Fish. 2008;81:7–13.

    Article  Google Scholar 

  31. 31.

    Evans JP. Getting sperm and egg together: things conserved and things diverged. Biol Reprod. 2000;63:355–60.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Modig C, Westerlund L, Olsson P-E. Chapter 5: Oocyte zona pellucida proteins. In: Babin PJ, Cerda J, and Lubzens E. The Fish Oocyte: From Basic Studies to Biotechnological Applications. Springer; 2007. p. 113–39.

    Google Scholar 

  33. 33.

    Spargo SC, Hope RM. Evolution and nomenclature of the zona pellucida gene family. Biol Reprod. 2003;68:358–62.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Podolsky RD. Fertilization ecology of egg coats: physical versus chemical contributions to fertilization success of free-spawned eggs. J Exp Biol. 2002;205:1657–68.

    PubMed  Google Scholar 

  35. 35.

    Dumont JN, Brummet AR. The vitelline envelope, chorion and micropyle of Fundulus heteroclitus eggs. Gamete Res. 1980;3:25–44.

    Article  Google Scholar 

  36. 36.

    Tian X, Pascal G, Fouchécourt S, Pontarotti P, Monget P. Gene birth, death, and divergence: The different scenarios of reproduction-related gene evolution. Biol Reprod. 2009;80:616–21.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Smith J, Paton IR, Hughes DC, Burt DW. Isolation and mapping the chicken zona pellucida genes: an insight into the evolution of orthologous genes in different species. Mol Reprod Dev. 2005;70:133–45.

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Goudet G, Mugnier S, Callebaut I, Monget P. Phylogenetic analysis and identification of pseudogenes reveal a progressive loss of zona pellucida genes during evolution of vertebrates. Biol Reprod. 2008;78:796–806.

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Berlin S, Smith NGC. Testing for adaptive evolution of the female reproductive protein ZPC in mammals, birds and fishes reveals problems with the M7-M8 likelihood ratio test. BMC Evol Biol. 2005;5:65.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  40. 40.

    Burford MO, Bernardi G. Incipient speciation within a subgenus of rockfish (Sebastosomus) provides evidence of recent radiations within an ancient species flock. Mar Biol. 2008;154:701–17.

    Article  Google Scholar 

  41. 41.

    Kendall Jr AW. An historical review of Sebastes taxonomy and systematics. Mar Fish Rev. 2000;62:1–23.

    Google Scholar 

  42. 42.

    Magnuson-Ford K, Ingram T, Redding DW, Mooers AO. Rockfish (Sebastes) that are evolutionarily isolated are also large, morphologically distinctive and vulnerable to overfishing. Biol Cons. 2009;142:1787–96.

    Article  Google Scholar 

  43. 43.

    Love MS, Morris P, McCrae M. Life history aspects of 19 rockfish species (Scorpaenidae: Sebastes) from the southern California bight. Silver Spring, Maryland: NOAA Tech Rep. NMFS 1990. p. 87.

  44. 44.

    Li Z, Gray AK, Love MS, Asahida T, Gharrett AJ. Phylogeny of members of the rockfish (Sebastes) subgenus Pteropodus and their relatives. Can J Zool. 2006;84:527–36.

    CAS  Article  Google Scholar 

  45. 45.

    Sivasundar A, Palumbi SR. Parallel amino acid replacements in the rhodopsins of the rockfishes (Sebastes spp.) associated with shifts in habitat depth. J Evol Biol. 2010;23:1159–69.

    CAS  Article  PubMed  Google Scholar 

  46. 46.

    Heras J, Koop BF, Aguilar A. A transcriptomic scan for positively selected genes in two closely related marine fishes: Sebastes caurinus and S. rastrelliger. Mar Genomics. 2011;4:93–8.

    Article  PubMed  Google Scholar 

  47. 47.

    Eigenmann CH, Eigenmann RS. Description of a new species of Sebastodes. Ca Acad Sci Proc. 1890;III(2):13.

    Google Scholar 

  48. 48.

    Gilbert CH. A preliminary report on the fishes collected by the steamer Albatross on the Pacific coast of North America during the year 1889, with descriptions of twelve new genera and ninety-two new species. Proc US Natl Mus. 1890;13:49–126.

    Article  Google Scholar 

  49. 49.

    Huang X, Madan A. CAP3: A DNA sequence assembly program. Genome Res. 1999;9:868–77.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  50. 50.

    von Schalburg KR, Rise ML, Brown GD, Davidson WS, Koop BF. A comprehensive survey of the genes involved in maturation and development of the rainbow trout ovary. Biol Reprod. 2005;72:687–99.

    Article  CAS  Google Scholar 

  51. 51.

    Salem M, Rexroad CE, Wang J, Thorgaard GH, Yao J. Characterization of the rainbow trout transcriptome using Sanger and 454-pyrosequencing approaches. BMC Genomics. 2010;11:564.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  52. 52.

    Tedersoo L, Nilsson RH, Abarenkov K, Jairus T, Sadam A, Saar I, et al. 454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. New Phytologist. 2010;188:291–301.

    CAS  Article  PubMed  Google Scholar 

  53. 53.

    Metzker ML. Sequencing technologies—the next generation. Nat Rev Genet. 2010;11:31–46.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    Jansa SA, Lundrigan BL, Tucker PK. Tests for positive selection on immune and reproductive genes in closely related species of the murine genus Mus. J Mol Evol. 2003;56:294–307.

    CAS  Article  PubMed  Google Scholar 

  55. 55.

    Péterfy M, Gyuris T, Basu R, Takács L. Lissencephaly-1 is one of the most conserved proteins between mouse and human: a single amino-acid difference in 410 residues. Gene. 1994;150:415–6.

    Article  PubMed  Google Scholar 

  56. 56.

    Escalier D. Knockout mouse models of sperm flagellum anomalies. Hum Reprod Update. 2006;12:449–61.

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Baarends WM, van der Laan R, Grootegoed JA. DNA repair mechanisms and gametogenesis. Reprod. 2001;121:31–9.

    CAS  Article  Google Scholar 

  58. 58.

    Körner CG, Wormington M, Muckenthaler M, Schneider S, Dehlin E, Wahle E. The deadenylating nuclease (DAN) is involved in poly(A) tail removal during the meiotic maturation of Xenopus oocytes. EMBO J. 1998;17:5427–37.

    Article  PubMed Central  PubMed  Google Scholar 

  59. 59.

    Vos MJ, Hageman J, Carra S, Kampinga HH. Structural and Functional Diversities between members of the human HSPB, HSPH, HSPA, and DNAJ chaperone families. Biochem. 2008;47:7001–11.

    CAS  Article  Google Scholar 

  60. 60.

    García-Herrero S, Garrido N, Martínez-Conejero JA, Remohí J, Pellicer A, Meseguer M. Ontological evaluation of transcriptional differences between sperm of infertile males and fertile donors using microarray analysis. J Assist Reprod Genet. 2010;27:111–20.

    Article  PubMed Central  PubMed  Google Scholar 

  61. 61.

    Munk K. Maximum ages of groundfishes in waters off Alaska and British Columbia and considerations of age determination. Alaska Fish Res Bull. 2001;8:12–21.

    Google Scholar 

  62. 62.

    Chen D, Pan KZ, Palter JE, Kapahi P. Longevity determined by developmental arrest genes in Caenorhabditis elegans. Aging Cell. 2007;6:525–33.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  63. 63.

    Deelen J, Uh HW, Monajemi R, van Heemst D, Thijssen PE, Böhringer S, et al. Gene set analysis of GWAS data for human longevity highlights the relevance of the insulin/IGF-1 signaling and telomere maintenance pathways. Age. 2013;35:235–49.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  64. 64.

    Hansen M, Taubert S, Crawford D, Libina N, Lee SJ, Kenyon C. Lifespan extension by conditions that inhibit translation in Caenorhabditis elegans. Aging Cell. 2007;6:95–110.

    CAS  Article  PubMed  Google Scholar 

  65. 65.

    Somarelli JA, Herrera RJ. Evolution of the 12 kDa FK506_binding protein gene. Biol of the Cell. 2007;99:311–21.

    CAS  Article  Google Scholar 

  66. 66.

    Nielsen R, Bustamante C, Clark AG, Glanowski S, Sackton TB, Hubisz MJ, et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 2005;3:e170.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  67. 67.

    Baldo L, Santos ME, Salzburger W. Comparative transcriptomics of Eastern African cichlid fishes shows signs of positive selection and a large contribution of untranslated regions to genetic diversity. Genome Biol Evol. 2011;3:443–55.

    Article  PubMed Central  PubMed  Google Scholar 

  68. 68.

    Hughes AL. Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level. J Hered. 2007;99:364–73.

    CAS  Article  Google Scholar 

  69. 69.

    Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, et al. Patterns of Positive Selection in Six Mammalian Genomes. PLoS Genet. 2008;4:1–17.

    Article  CAS  Google Scholar 

  70. 70.

    Jovine L et al. Zona pellucida domain proteins. Annu Rev Biochem. 2005;74:83–114.

    CAS  Article  PubMed  Google Scholar 

  71. 71.

    Meslin C, Mugnier S, Callebaut I, Laurin M, Pascal G, Poupon A, et al. Evolution of Genes Involved in Gamete Interaction: Evidence for Positive Selection Duplications and Losses in Vertebrates. PLoS One. 2012;7:e44548.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  72. 72.

    Barluenga M et al. Sympatric speciation in Nicaraguan crater lake cichlid fish. Nature. 2006;439:719–23.

    CAS  Article  PubMed  Google Scholar 

  73. 73.

    Hurst LD. The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 2002;18:486–7.

    Article  PubMed  Google Scholar 

  74. 74.

    Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–52.

    CAS  Article  PubMed  Google Scholar 

  75. 75.

    Russo CAM, Takezaki N, Nei M. Molecular Phylogeny and Divergence Times of Drosophilid Species. Mol Biol Evol. 1995;12:391–404.

    CAS  PubMed  Google Scholar 

  76. 76.

    Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S. Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 2003;13:831–7.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  77. 77.

    O’Brien KP, Remm M, Sonnhammer ELL. INPARANOID: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33:D476–80.

    Article  PubMed Central  PubMed  Google Scholar 

  78. 78.

    Coy JF et al. Highly conserved 3’ UTR and expression pattern of FXR1 points to a divergent gene regulation of FXR1 and FMR1. Hum Mol Genet. 1995;4:2209–18.

    CAS  Article  PubMed  Google Scholar 

  79. 79.

    Meyer A, Van de Peer Y. From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005;27:937–45.

    CAS  Article  PubMed  Google Scholar 

  80. 80.

    Sunagawa S, Wilson EC, Thaler M, Smith ML, Caruso C, Pringle JR, et al. Generation and analysis of transcriptomic resources for a model system on the rise: the sea anemone Aiptasia pallida and its dinoflagellate endosymbiont. BMC Genomics. 2009;10:258.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  81. 81.

    Forment J, Gilabert F, Robles A, Conejero V, Nuez F, Blanca JM. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration. BMC Bioinformatics. 2008;9:5.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  82. 82.

    Chou HH, Holmes MH. DNA sequence quality trimming and vector removal. Bioinformatics. 2001;17:1093–104.

    CAS  Article  PubMed  Google Scholar 

  83. 83.

    Smit AFA, Hubley R, Green P. 2010. RepeatMasker Open-3.0. http://www.repeatmasker.org.

  84. 84.

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990;215:403–10.

    CAS  Article  PubMed  Google Scholar 

  85. 85.

    Blanca J, Chevreux B. sff_extract. Valencia, Spain: COMAV Institute, Universidad Politécnica; 2010.

    Google Scholar 

  86. 86.

    Barker MS, Dlugosch KM, Dinh L, Challa RS, Kane NC, King MG, et al. EvoPipes.net: Bioinformatic tools for ecological and evolutionary genomics. Evol Bioinform. 2010;6:143–9.

    Article  Google Scholar 

  87. 87.

    Lassmann T, Hayashizaki Y, Daub CO. TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics. 2010;25:2839–40.

    Article  CAS  Google Scholar 

  88. 88.

    Green P. Crossmatch. 1994. http://www.phrap.org.

  89. 89.

    Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WE, Wetter T, et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004;14:1147–59.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  90. 90.

    Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.

    CAS  Article  PubMed  Google Scholar 

  91. 91.

    Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res. 1997;25:31–6.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  92. 92.

    Hubbard T et al. Ensembl 2005. Nucleic Acids Res. 2005;33:D447–53.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  93. 93.

    Min XJ, Butler G, Storms R, Tsang A. OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res. 2005;33:W677–680.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  94. 94.

    Suyama M, Torrents D, Bork P. pal2nal: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–12.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  95. 95.

    Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–8.

    CAS  Article  PubMed  Google Scholar 

  96. 96.

    Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J. KAKS_CALCULATOR: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics. 2006;4:259–63.

    CAS  Article  PubMed  Google Scholar 

  97. 97.

    Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17:32–43.

    CAS  Article  PubMed  Google Scholar 

  98. 98.

    Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, et al. Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–7.

    CAS  Article  PubMed  Google Scholar 

  99. 99.

    Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comp Appl Biosci. 1997;13:555–6.

    CAS  PubMed  Google Scholar 

  100. 100.

    Yang Z, Nielsen R, Goldman N, Pedersen AM. Codon-Substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000;155:431–49.

    CAS  PubMed Central  PubMed  Google Scholar 

  101. 101.

    Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–5.

    CAS  Article  PubMed  Google Scholar 

  102. 102.

    Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–21.

    CAS  Article  PubMed  Google Scholar 

  103. 103.

    Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.

    CAS  Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to thank D. Ardell for assistance with Perl scripts. We thank Eddy Rubin, the DoE Joint Genome Institute Director, for providing the sequencing support needed to perform this work in the context of a course co-taught with JGI scientists at UC Merced. This work was supported by a grant from the NSF (DEB-0719475) to A.A., a UC Merced GRC fellowship to J.H. We would also like to thank M. N Dawson, M. Medina, and D. Ardell for providing constructive comments on the scientific merit of this manuscript. A. Winek and D. Elizondo who reviewed this manuscript for grammatical errors and clarity of concepts. Lastly, J. Liberto for assistance in organizing data files prior to analyses.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joseph Heras.

Additional information

Competing interests

All authors declare that they have no competing interests.

Authors’ contributions

JH analyzed the data and drafted the manuscript. KM contributed to this manuscript by assembling 454 data from S. saxicola. SS prepared, cleaned, and assembled ESTs from S. goodei. AA provided guidance and mentorship for this study, and both JH and AA generated Perl scripts for analyzing the data. All authors read the manuscript, provided constructive feedback, and approved of the final manuscript.

Additional files

Additional file 1:

EST assembly statististics for S. goodei. (XLS 22 kb)

Additional file 2:

S. saxicola assembly summary statistics. (XLS 21 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Heras, J., McClintock, K., Sunagawa, S. et al. Gonadal transcriptomics elucidate patterns of adaptive evolution within marine rockfishes (Sebastes). BMC Genomics 16, 656 (2015). https://doi.org/10.1186/s12864-015-1870-0

Download citation

Keywords

  • Bioinformatics
  • Orthologs
  • Positive selection
  • Reproductive genes
  • Untranslated region
  • Zona pellucida