Landscape genomics: natural selection drives the evolution of mitogenome in penguins

Background Mitochondria play a key role in the balance of energy and heat production, and therefore the mitochondrial genome is under natural selection by environmental temperature and food availability, since starvation can generate more efficient coupling of energy production. However, selection over mitochondrial DNA (mtDNA) genes has usually been evaluated at the population level. We sequenced by NGS 12 mitogenomes and with four published genomes, assessed genetic variation in ten penguin species distributed from the equator to Antarctica. Signatures of selection of 13 mitochondrial protein-coding genes were evaluated by comparing among species within and among genera (Spheniscus, Pygoscelis, Eudyptula, Eudyptes and Aptenodytes). The genetic data were correlated with environmental data obtained through remote sensing (sea surface temperature [SST], chlorophyll levels [Chl] and a combination of SST and Chl [COM]) through the distribution of these species. Results We identified the complete mtDNA genomes of several penguin species, including ND6 and 8 tRNAs on the light strand and 12 protein coding genes, 14 tRNAs and two rRNAs positioned on the heavy strand. The highest diversity was found in NADH dehydrogenase genes and the lowest in COX genes. The lowest evolutionary divergence among species was between Humboldt (Spheniscus humboldti) and Galapagos (S. mendiculus) penguins (0.004), while the highest was observed between little penguin (Eudyptula minor) and Adélie penguin (Pygoscelis adeliae) (0.097). We identified a signature of purifying selection (Ka/Ks < 1) across the mitochondrial genome, which is consistent with the hypothesis that purifying selection is constraining mitogenome evolution to maintain Oxidative phosphorylation (OXPHOS) proteins and functionality. Pairwise species maximum-likelihood analyses of selection at codon sites suggest positive selection has occurred on ATP8 (Fixed-Effects Likelihood, FEL) and ND4 (Single Likelihood Ancestral Counting, SLAC) in all penguins. In contrast, COX1 had a signature of strong negative selection. ND4 Ka/Ks ratios were highly correlated with SST (Mantel, p-value: 0.0001; GLM, p-value: 0.00001) and thus may be related to climate adaptation throughout penguin speciation. Conclusions These results identify mtDNA candidate genes under selection which could be involved in broad-scale adaptations of penguins to their environment. Such knowledge may be particularly useful for developing predictive models of how these species may respond to severe climatic changes in the future. Electronic supplementary material The online version of this article (10.1186/s12864-017-4424-9) contains supplementary material, which is available to authorized users.


Background
The mitochondrial genome (mtDNA) is under continuous selection since the 13 protein-coding genes produce polypeptide products that work in association with nuclear-encoded subunits of protein complexes involved in oxidative phosphorylation (OXPHOS) [1]. Three types of hypotheses have been proposed to explain mitogenome evolution: 1) mitochondrial OXPHOS generates both energy and heat, and environmental temperature imposes a selective trade off between ATP production and heat [2][3][4][5][6]; 2) risk of starvation is associated with restricted food availability, since starvation can generate more efficient coupling of energy production by OXPHOS pathway [7][8][9][10]; 3) a more recent and debated hypothesis of immune response to pathogens [11][12][13][14][15]. Therefore, chlorophyll (Chl), as a measurement of primary productivity [16], and sea surface temperature (SST), are important environmental variables for marine organisms that are directly related to the efficiency of the OXPHOS and relevant to understanding mtDNA gene selection patterns. As OXPHOS is the terminal stage of cellular respiration, disruption of this process will be lethal for individual cells and if it occurs widely in an individual, it will adversely impact its fitness [3]. Consequently, the mitochondrial genome on one hand evolves under continuous purifying selection (or negative selection) that removes deleterious variation from populations [17], while on the other hand, beneficial variation in extreme temperature environments may be under positive selection, increasing in frequency in a population or species. It is possible that selection may operate similarly among species in similar environments or it may drive adaptations in different genes.
Patterns of adaptation in the mtDNA of organisms have been evaluated mostly at the population level by associating each population or lineage with local climatic conditions [8,[18][19][20][21]; however, only relatively few have incorporated environmental data in the genetic analyses and generally only using nuclear genes or whole-genome data [22,23]. Environmental temperature is the main variable that has been linked to selection in mtDNA genes [8,[18][19][20], along with altitude [24][25][26] and oxygen availability [27]. Although most studies on the role of selection on mtDNA genome evolution in vertebrate species have focused on mammals at the intraspecific level [28][29][30][31], recently there has been an increased focus on birds [32][33][34]. Here, we provide one of the first species-level analyses in birds linking patterns of mtDNA selection with environmental variables (temperature, chlorophyll) which are associated with the evolution of this genome. Chlorophyll (Chl) is an excellent indirect measurement of food availability because primary productivity is associated with species distributions at trophic levels, including krill and penguins [16].
Penguins inhabit a broad latitudinal distribution in the Southern Hemisphere from the tropics to the extreme cold of the Antarctic. While all penguin species can live and forage at sea in cold, nutrient-rich water (including those breeding on the Galapagos Islands), they breed on land under varying climatic conditions. The northernmost species is the Galapagos penguin (Spheniscus mendiculus), while the southernmost distributed species, the Adélie penguin (P. adeliae) and Emperor penguin (Aptenodytes forsteri), breed in the Antarctic, south of the Antarctic Convergence (Polar Front). Emperor penguins breed on stable fast ice during the Antarctic winter when temperatures drop regularly to −50°C. As the evolution, speciation and extinction of penguins are closely related to historical climatic changes [35][36][37][38], the broad distribution of penguins, which includes different and extreme climatic conditions, provides a unique opportunity to identify molecular genetic patterns associated with these climatic conditions.
We used Next Generation Sequencing approaches to study patterns of selection in the mitochondrial genome of penguins from two genera (Spheniscus and Pygoscelis) and correlated the results with remote sensing spatial data for each species from throughout their distribution (sea surface temperature, chlorophyll levels, and both variables combined). The four Spheniscus penguin species are distributed along the coast of southern Africa (African penguin S. demersus) and along the South America coast from Galapagos Islands in the tropics (Galapagos penguin S. mendiculus) to southern Patagonia (Humboldt S. humbodti and Magellanic penguin S. magellanicus) (Fig. 1). Galapagos and Humboldt penguin populations have experienced high mortality rates [39,40] and decreased breeding success [39,41] from increased water temperatures during El Niño Southern Oscillations (ENSO) [42]. The three Pygoscelis penguins breed on sub-Antarctic islands and the coast of Antarctica and they have sympatric distributions in the Western Antarctic Peninsula (WAP). The Gentoo penguin (P. papua) has the northernmost range in the genus, with a distribution along sub-Antarctic Islands and the Antarctic Peninsula. The Chinstrap penguin (P. antarcticus) has a more southern distribution, almost exclusively around the Antarctic peninsula. The Adélie penguin is the most southern and pagophilic species, with a circumpolar distribution [43]. Over the last decades climate change has impacted the WAP, increasing sea surface temperatures (SST) and changing patterns of penguin food availability (e.g. krill) [44][45][46]. Concurrently, Adélie and Chinstrap penguin populations have been decreasing, while Gentoo penguin populations have increased in numbers and distributions [44][45][46].
Historical climate changes have been linked with penguin species diversification. Severe declines in Antarctic temperature that began approximately 12 mya coincide with the divergence of the four major penguin groups, around 11-16 mya [38]. This penguin phylogeny is well characterized, making it possible to assess evidence of selection and adaptive evolution of the mtDNA and to explore possible links with environmental changes, including genes involved in penguin thermal adaptation. Sequences of entire mitochondrial genomes are available for five of the 18 extant species of penguins [47][48][49][50], including the Adélie, Chinstrap, African, Little, and Southern Rockhopper (E. chrysocome) penguins and more recently the entire genome became available for the Adélie and Emperor penguin [51]. The mtDNA genome of P. antarcticus consists of 15,972 bp and has 94% identity with its sister species P. adeliae [50]. Subramanian et al. [49] compared the mitochondrial genomes of modern and ancient (up to 44,000 years old) P. adeliae, documenting higher evolutionary rates than described earlier by phylogenetic studies. However, those studies evaluated the genome without a broader comparative approach to understand the evolution of the group and assess the evidence of signatures of selection on the mitochondrial genome.
We hypothesize that selection operates differentially on the mtDNA coding genes of penguin species in response to environmental variables. We evaluated variation in mitochondrial genomes at different taxonomic levels in seven species of penguins from South America and Africa (all four Spheniscus species) and Antarctica (all three Pygoscelis species). We then compared our results with two penguin species from temperate and sub-Antarctic regions (E. minor and E. chrysocome) and with A. forsteri, which survives and breeds during the mostextreme cold conditions of the Antarctic winter. We evaluated evolutionary rates and signatures of natural selection for all 13 mtDNA protein-coding genes and across the mtDNA genome phylogeny for the penguin species. Lastly, we assessed the genetic data for each gene against the spatial environmental data (sea surface temperature, chlorophyll and both variables combined) from remote sensing and GIS databases for each species along its distributional range.

SOLiD sequencing
Genomic DNA was isolated from blood samples obtained from all penguins and preserved in ethanol using the salt method [53]. DNA from six individuals of the three species was quantified and quality checked with Invitrogen's Pico-Green®assay kit (fluorometry). Genomic sequencing in ABI-SOLiD™ 5500 XL was performed at Omics Solution, a Next Generation Sequencing facility. DNA was desalted then concentrated using standard EtOH/sodium acetate precipitation at 20°C for 2 h, followed by two 70% EtOH washes. Then the pooled DNA was re-dissolved in low TE as per standard protocol for ABI SOLiD sequencing of genomic DNA fragment libraries. DNA samples were sheared in Covaris™S220 System, which fragments the input DNA, by sonication, into small fragments with a mean size of 160 bp.
The fragmented DNA was then purified with SOLiD Library Column Purification Kit, and libraries were prepared according to standard SOLiD protocols. Fragment libraries for the twelve penguins were prepared separately, P1 and P2 adapters were ligated, and each sample was tagged with a different barcode (a known sequence of ten bp in adapters). Prepared libraries were quantified by real-time PCR on a Light Cycler®Nano (Roche) using Invitrogen's Quantification Kit for SOLiD. The double stranded library was added at a concentration of 0.2 pg/μL to the emulsion with 2400 million beads, according to manufacturers' instructions. Thirty percent of the beads were P2 positive (contained amplified library fragments) before enrichment and 90% of the beads were P2 positive after enrichment, yielding 790 million beads deposited on the Flow Chip. Library beads were sequenced on a SOLiD™ 5500 XL using standard chemistry for pairedend fragment libraries and 75-35-bp read lengths.

SOLiD sequence alignment
The color-space reads (di-base encoded) were aligned with LifeScope™ (Applied Biosystems) using the genome assembly of P. adeliae [51] as a reference. The reference was translated into color-space with the aim of mapping the reads. The color-space reads helped to improve the quality of each base call, since each base was read twice during the sequencing step.
The consensus sequence was built from the binary alignment map (BAM) files obtained in the previous step. We used SAMtools [54] to obtain all bases mapped to each position, BCF tools to get the most probable genotype per position, and VCF utils to build the consensus sequence in FASTQ format. The FASTQ file was then converted to FASTA using SEQTK.

Mitochondrial sequence assembly
The mitochondrial genomes were assembled with the Denovo2 pipeline developed by Life Technologies using Velvet assembler version 1.2.09 [55] in color-space mode. ASiD (Assembly Assistant for SOLiD) was used to fill gaps between contigs that formed scaffolds and to perform a color-space to base-space translation for the final assembly. Individual genes were selected from each of the penguins' contigs using Sequencher software 5.1 (Gene Codes Corp., MI) and annotated by homology with genes of mitogenomes of the penguins previously published [48,51,56,57]. Due to the low number of reads from one individual of P. papua (Table 2), this individual was not included in the data analyses. All 11 mitogenome were submitted to GenBank (accession numbers KM891593; KU361803-KU361806; KU356673-KU356677).

Molecular evolution and selection
Sequences were aligned and polymorphic sites were confirmed by visual review of the chromatograms using Sequencher v. 5.1. (Gene Codes, Ann Arbor, MI, USA) and multiple alignments were done using ClustalX v. 2.1 [58]. For interspecific data analysis we compared the eleven mtDNA genomes with four other penguins genomes: S. demersus obtained from KwaZulu-Natal Province of South Africa (African penguin) [56], E. minor obtained from Nelson in New Zealand (Little blue penguin) [48], E. chrysocome (Rockhopper penguin) [57] and A. forsteri (Emperor penguin) [51].
The number of indels, polymorphic sites (S), nucleotide diversity (π), synonymous mutation rates (Ks), nonsynonymous mutation rates (Ka) and the Ka/Ks (or ω) ratio using the DNAsp v. 5 [59] were estimated for all 13 proteincoding genes from the complete mtDNA. For analyses of the reading frame of the genes, such as Ka/Ks, only the ND3 gene was considered a different reading frame as described by Mindell et al. [60]. All Ka and Ks estimates and corresponding ratios were obtained between pairwise species within Spheniscus and Pygoscelis genera, as well as between all five genera (Additional file 1 Table S1). Since we sequenced several species of Spheniscus and Pygoscelis genera, the values between genera were averages of the pairwise species Ka and Ks values for the genera comparison. We also calculated the pairwise distance between all 15 penguins studied here (10 species) for all 13 concatenated genes of the mtDNA genomes using the Maximun Likelihood model (Additional file 2 Table S2) with Mega v. 6.06 software [61,62] and Arlequin software in R package [63,64]. All positions containing gaps were eliminated, remaining a total of 10,874 positions in the final dataset.
Phylogenetic reconstruction was used to evaluate selection on each codon, gene and phylogenetic branch. We combined each of the 13 protein-coding genes from the mtDNA genome from the studied penguin species into a concatenated data set of 11,409 bp. The phylogeny was also estimated excluding the ND6 gene (a total of 10,886 bp), which is encoded by the light-strand [49] and has a significantly different base composition from the heavy-chain [65]. Phylogenetic relationships were inferred using Bayesian inference methods (BA) with Markov chain Monte Carlo (MCMC) sampling as implemented in MrBayes v. 3.1.2 [66,67]. The evolutionary model selected GTR + I + G, using jModelTest v. 0.1.1 [68] and Akaike Information Criterion (AIC). The Mr. Bayes analyses were run for one million generations while sampling every 100 generations. At this point the standard deviation of split frequencies was <0.01, indicating that both runs had converged. Phylogenetic reconstructions were visualized using Figtree v. 1.2.2 [69]. Although positive selection is frequently inferred by genetic data analysis as Ka/Ks > 1 (and purifying selection by Ka/Ks < 1) [70], a gene with Ka/Ks lower than 1 may show signature of positive selection at codon level. Evidence for signatures of selection at codon sites was evaluated for each gene with the concatenated data set using the BA phylogeny and the selected model of evolution without outgroups with the three algorithms available in HyPhy software [71][72][73]. For comparison, we used the Single Likelihood Ancestral Counting (SLAC), Fixed-Effects Likelihood (FEL) and the Random Effects Likelihood (REL) methods using the best fit nucleotide model [71]. The single likelihood ancestor counting (SLAC) method is based on weighting across all possible ancestral reconstructions, or employs sampling from ancestral reconstructions. The fixed effects likelihood (FEL) method estimates nonsynonymous and synonymous substitution rates at each site. Random effects likelihood (REL) models variation in nonsynonymous and synonymous rates across sites based on a predefined distribution, where a Bayesian approach are used to infer selection pressure at an individual site [71]. We also performed the branch Site REL [72] analysis in HyPhy software to estimate branch-specific episodes of positive selection along the phylogeny.

Environmental data and species distribution
Seasonal Chlorophyll concentrations (Chl) and Sea Surface Temperatures (SST) were acquired from Sea-viewing Wide Field-of-view Sensor (SeaWiFS) and Moderate Resolution Imaging Spectroradiometer (MODIS-Aqua) global seasonal databases that are based on a spatial resolution of 9 × 9 km [74]. The standard SeaWiFS chlorophyll product, ChlO C4 , is estimated based on an empirical maximal band-ratio algorithm that uses remote sensing reflectance measurements at 443, 490, 510 and 555 nm [75]. This algorithm relates the maximum of three remote sensing reflectance band-ratios (443/555, 490/555 and 510/555) to field measures of sea chlorophyll concentration [76] SST is based on MODIS-calibrated mid-and far-infrared radiances (IR), estimated using an algorithm that exploits differences in atmospheric transmissivity in the different IR bands [77]. The reliability of SST data is supported by its high correlation with sea surface temperatures measured during both oceanographic cruises and at local stations, with a slight positive bias at higher temperatures [78]. Here we use Chl and SST as proxies for the environmental conditions along the distributional range of each penguin species.
Satellite data (Chl and SST) were processed and saved in image format using SAGA -System for Automated Geoscientific Analyses (SAGA Development Team 2008). Penguin at-sea distribution during breeding were obtained from Borboroglu and Boersma [79] and digitized in vector format using a Geographic Information System QGis 2.6 [80]. Mean and standard deviation of seasonal Chl and SST products were obtained for each species by intersecting seasonal images with the vector files representing the geographical distribution of each species (Fig. 2). All data were re-scaled into a range from 0 to 1 to be comparable the Chl (Fig. 2a, c) and SST data (Fig. 2b, d).
Using this information, between-species pairwise Euclidian distances for Chl, SST and Chl-SST (combinated environmental distance COM) were calculated using the equation Eq. (1): where D xy is the Chl, SST or COM distance between two penguin species x and y based on the seasonal variability of Chl and SST. W, A, SP and S are the values of Chl, SST or COM in the winter, autumn, spring and summer season respectively.

Mantel test and GLM models
Mantel tests and GLM models (Generalized Linear Model) were implemented to explore the relationships between distance matrices of genetic parameters (π and ka/ks) and environmental parameters (Chl, SST and COM) as explanatory variables [81,82]. To study the relationship between genetic patterns and the environmental parameters (the distances across Chl, SST and COM dimension), the Mantel matrix distances were divided into three sub-matrices (Example in Additional file 3  Table S3), each describing the patterns between pairs of species within a bounded interval of Chl, SST and COM distances [81]. These bounded intervals consisted of a) short pairwise distances corresponding with the minimum to 33% of the data distance, b) middle distances corresponding with pairwise distances between 33% and 66% of the data distance, c) large distances consisting of the largest 33% of the pairwise distances and d) all pairwise data. For Mantel tests, p-values were obtained based on 10,000 simulations. Normality of the data (Ka/ Ks and π) was tested applying the Shapiro-Wilk test. The GLMs p-values and AICs were calculated considering a Gaussian error structure for all data under normal distribution and gamma error structure for data without normal distribution. All statistical analyses were implemented using R packages [64].

MtDNA genomes of penguins
The number of reads for each of the 12 genomes ranged from 52,990,428 to 158,565,602. One individual of P. papua was excluded because of insufficient reads (7,341,640). The number of reads attributed to mtDNA genomes ranged from 5049 to 25,947 (1647 for P. papua) and the percent coverage varied from 16 to 82 (Table 2). We recovered the full sequence from all mtDNA regions except the repetitive region of the D-loop.
The ND6 and 8 tRNAs are transcribed from the light strand, while the other 12 protein coding genes and 14 tRNAs and two rRNAs are positioned on the heavy strand. This pattern of transcription has been described for P. adeliae, S. demersus and other bird species [49, Fig. 2 Chlorophyll (Chl) and Sea Surface Temperature (SST) along most of the 10 species breeding distribution. Data were obtained from SeaWifs [75] and MODIS sensors [74]. Map of (a) Chl and (b) SST during the summer; and graph of the four seasons (c) Chl and (d) SST for each penguin species 56,83]. Indels, when comparing all ten species, were observed in five of the thirteen mtDNA genes (ATP8, CYTB, ND4, ND5 and ND6). All indels comprised three consecutive bases, such as for Pygoscelis and A. forsteri at ATP8; for E. chrysocome at CYTB; for A. forsteri at ND4 and ND6; and for E. minor at ND5.

Gene evolution and natural selection
The general pattern of Ka/Ks ratios for the different genes was similar between pairwise species at inter-genera comparisons (Spheniscus, Pygoscelis, Eudyptula, Eudyptes, Aptenodytes; Fig. 4; see Additional file 1 Table S1). The Ka/Ks ratios were all <1 (values under 1 suggest negative selection and over 1 positive selection using this method). ATP8 had the highest values and variability of Ka/Ks ratio among all species comparison between genera ( Fig. 4) with values between 0.1 and 0.4. ATP8 Ka/Ks ratios were lower between Pygoscelis and Spheniscus (0.09 to 0.15), and between A. forsteri and Spheniscus (0.14 to 0.16). High ATP8 Ka/Ks ratios were observed between E. minor and most other genera (0.24 to 0.41).
Markedly low values of Ka/Ks ratios were found for within and between pairwise genera comparisons and were observed for COX1, followed by COX2, COX3 and ND1, suggesting negative selection (Fig. 4). In the intragenus comparisons, the Ka/Ks ratios were zero for COX1, ND3 and ND4L among all Spheniscus species and were also low for COX2, ND1 and COX3. Similarly, within Pygoscelis comparisons, Ka/Ks ratios were lowest for COX1, followed by COX3, COX2, ND1 and ATP6. Therefore, COX1 had a signature of purifying selection for amino acid sequence conservation for all pairwise comparisons among penguin genera and species.
Similar results were detected using the SLAC, REL and FEL algorithms. Overall, significant levels of positive selection were detected for 69 different codon sites in four genes (COX1, ATP8, ND4L, ND4). Only one codon was under positive selection for ND4 using SLAC (codon 187) and one codon for ATP8 (codon 36) using FEL. We detected a large number of codons under positive selection for COX1 (a total of 39 codons) and 28 codons for ND4L using REL. Although REL methods have higher power and are more computationally intensive than SLAC methods of detecting selection, they have higher rates of false positives with small data sets [71,84]. FEL approaches appear to be more robust with fewer sequences than REL and SLAC that may underestimate the substitution rate [71]. Indeed, no codon had positive and significant values by SLAC for ATP8. The largest number of codons with positive, but not significant signatures were identified for this gene using this method (18% of codons), in contrast with patterns observed in other genes (1.4% in COX1 to 11% in ND2, ND5 and ND6, Figs. 5, 6). Low values of negative selection were obtained for ATP8 and ND4L using SLAC and FEL, and the most significant values were obtained for COX1 (Fig. 6). We identified 375 codons with significant negative selection using SLAC algorithms and 1787 codons using FEL algorithms. All of the codons identified by SLAC overlapped with those identified using FEL algorithms. We did not detect lineages under positive selection using a Branch REL analysis based on the phylogeny.

Environmental data
Chlorophyll concentration (Fig. 2c) was high in the area occupied by S. humboldti, and varied highly among seasons (2.2 mg/m 3 and 4.3 mg/m 3 in winter and summer respectively). Chlorophyll concentrations in the distributional range of S. demersus were also high, but more stable (from 2.3 mg/ m 3 to 2.5 mg/m 3 ). Highest SSTs were recorded throughout the range of S. mendiculus (32°C and 37°C during winter and summer respectively), followed by lower SSTs throughout areas occupied by S. demersus and S. humboldti 32.8°C; and 23.6-29.6°C during winter and summer respectively), with the lowest SST in areas where P. adelie, A. fosteri and P. antarcticus are distributed (0.5°C and 5.2°C during winter and summer respectively) (Fig. 2d). Nucleotide diversity (π) was not significantly correlated with environmental data at different spatial distances in any of 13 mtDNA genes for Mantel test (Additional file 4 Table S4). The Ka/Ks ratio was related with environmental data at different spatial scales for four genes supported by the significant values of Mantel test and GLM (ND4, ND3, ND5 and Cytb; Table 4) and three other genes only for GLM (ND1, COX1 and ATP8). A significant positive correlation between an environmental variable and Ka/Ks ratio for a specific mtDNA gene may suggest that species inhabiting similar environments undergo adaptation processes for the same gene (similar protein reflected by Ka/ Ks) and when inhabiting different environments undergo different gene adaptation processes (different protein reflected by Ka/Ks). The highest significant correlation was between SST and Ka/Ks ratio for ND4 (Mantel, p-value: 0.0001; GLM, p-value: 0.00001) for all distances. Although, chlorophyll concentration was significantly correlated with Ka/Ks ratio of ND1, COX1, ND5 for GLM, only COX1 showed signature of selection and low p-value for Mantel (p-value: 0.063) at short distances. The combined environmental data (COM) were correlated with the Ka/Ks ratio for three genes supported by significant Mantel test and GLM: ND3 (Mantel, p-value: 0.0304; GLM p-value: 0.0188), ND5 (Mantel, p-value: 0.0232; GLM p-value: 0.0095), and Cytb at short distance (Mantel, p-value: 0.0463; GLM p-value: 0.0103) and middle distance (Mantel, p-value: 0.0064; GLM p-value: 0.0112). All significant p-values for the Mantel test were also significant for GLM (Additional file 4 Table S4). Although we detected few significant correlations between Ka/Ks ratio and environmental variables, we cannot reject the possibility that these results are influenced by other variables not evaluated here.  Nucleotide diversity among penguin species was not correlated with the environmental variables (no significant value for Mantel test), nevertheless Ka/Ks ratios were significantly correlated. This supports the premise that broadly, genetic diversity of mtDNA genes are conserved, but that specific signals of selection related to environmental variables could be identified by comparative estimation of Ka/Ks ratios among genes.
Assessing genes under selection, ND4 and COX1 had the highest Ka/Ks ratio between species from distinct environments and the lowest ratios between species from the same environment (Figs. 4, 7). In contrast, ATP8 had the highest Ka/Ks ratio between similar environments and the lowest between different ones.

Discussion
Our analyses revealed strong and contrasting signals and selection patterns in the coding mtDNA genes of closely related penguin species, some of which correlated with patterns of sea surface water temperatures and chlorophyll concentrations. We identified a widespread signature of purifying selection (Ka/Ks < 1) across the mitochondrial genome, consistent with expected patterns if purifying selection were constraining the mitogenome evolution to maintain the OXPHOS proteins and functionality [20,70]. These patterns included a signature of purifying selection in COX genes, especially COX1, which had Ka/Ks ratios close to zero for all comparison among species and genera. In contrast, one codon under positive selection was evidenced for ND4 and one for ATP8 (SLAC and FEL analysis respectively). Sea surface temperature (SST) was strongly correlated with the Ka/Ks ratio of ND4 and chlorophyll concentration was correlated with Ka/Ks ratios of COX1. The Ka/Ks ratio patterns for ATP8 varied widely among genera and there was only weak or no evidence of a correlation between genetic diversity of ATP8 and measured environmental factors. These results identify mtDNA candidate genes under selection which could be involved in broad-scale adaptations of penguins to their environment.

Evidence linking selection and gene function NADH dehydrogenase
In metazoans, mitogenome complexes I through V are physically positioned relative to each other in accordance with their function, with the complex II genes residing in the nucleus [85]. A positively selected codon was detected by SLAC model on the mitogenome of penguins in the Complex I or NADH dehydrogenase gene (ND4). Complex I through IV generate the proton gradient in the intermembrane space of mitochondria. Complex I is the main entrance point of electrons and the largest complex of the respiratory chain. It catalyzes the transfer of two electrons from NADH to ubiquinone and translocates four protons across the inner mitochondrial membrane [7]. Complex I consists of different protein subunits, with fourteen homologous subunits present in all organisms, and which constitute the enzyme. ND1 and ND2 are situated at the connection between the peripheral and the membrane arm, whereas ND4 and ND5 are at the distal end [86,87]. If ND2, ND4 and ND5 are the actual proton pumping devices [86], mutations in these genes may alter the efficiency of the proton-pumping process. The concentration of codons under positive selection in specific domains of the mitogenome are very likely to be related to protein function [20,88] and a meta-analysis of 237 vertebrates showed that most of the genes under positive selection were located in Complex I [89]. NADH dehydrogenase genes have also shown signatures of positive selection at a population level in birds (Parus spp.) in the ND2 gene [90], in lineages of an Australian bird (Eopsaltria australis) [20] in ND4 and ND4L. Similarly the Chinese snubnosed monkey (Rinopithecus ssp.) showed evidence of adaptive variation related with physiological hypoxia and cold stress in altitude in ND2 and ND6 [91]. Few studies have assessed if selection on mtDNA genes is correlated with environmental variables [92]. In the Atlantic herring (Clupea harengus), significant positive selection was observed in ND2, ND4 and ND5. Correlations between genetic diversity and variation of environmental factors (temperature, salinity and latitude) were generally weak, although the authors addressed some correlation between temperature and genetic distances with ND3 and ND4L [92].

ATP synthase
In our study, the Ka/Ks ratio for ATP8 was lower than 1, suggesting negative selection for the gene, although it was higher than any other value for mtDNA genes in penguins. However, one codon of ATP8 (Complex V) showed evidence of positive selection (under model FEL). Complex V (or ATP synthase) is the final enzyme in the OXPHOS pathway and is ubiquitous throughout all forms of life and biological functions. The enzyme uses the energy stored in a proton gradient across a membrane to drive the synthesis of ATP from ADP and phosphate. Complex V uses the proton gradient resulting from the activity of the respiratory chain enzymes complexes I, III and IV for ATP synthesis, generating 95% of cellular ATP. Complex V is a multisubunit complex with two functional domains, F1 and F0, for which ATP8 encodes a core subunit of the F0 component. Across mammalian evolution, ATP6 is more conserved, in contrast to the ATP8 subunit which has some highly variable sites with extreme amino acid changes of properties, suggesting that there are differences in its regulatory function among species [88].

Cytochrome c oxidase
COXI, complex IV, showed purifying selection in penguins, as has been observed across all Aves [32]. We estimated Ka/Ks ratios close to zero for all comparison among species and genera, and similar results were obtained using both the SLAC and FEL methods. These genes from complex IV COX1 ATP8 ND4 Fig. 7 Results of signature of selection in the penguin mtDNA genome. Ribosomal RNAs (blue), Transfer RNAs (green and purple lines), Protein-coding genes with the oxidative phosphorylation system complex are represented by different colors. ND6* is transcribed from the light strand. Percentage of codons under negative selection is shown for each gene and numbers in blue highlight the relatively low percentage for ATP8. Each star represents codons under significant positive selection as determined by FEL (yellow star) and REL (orange star) in ATP8 and ND4 genes. Ka/Ks ratio for ND4 is correlated with superficial surface temperature and COX1 for chlorophyll levels. The four graphics around the genome indicate with red arrows the highest and lowest for Ka/Ks ratio comparison between species from different genus for each of the four genes under selection. COX1 and ND4 highest Ka/Ks values are associated with pairwise ratio between species from different climates, and lowest among those from similar climates. With ATP8, highest Ka/Ks values were associated with pairwise ratios between species from similar climates and lowest among those from from distinct climates. Scale for graphic of ND3 and ATP8 genes are Ka/Ks from 0 to 1, and for COX1 and ND4 are from 0 to 0.1. By Juan Pablo Bravo represent the terminal complex of the respiratory chain. It catalyzes electron transfer from cytochrome c to molecular oxygen, reducing the latter to water. It has been postulated that Complex IV of the OXPHOS, in which COX1 is one of the two catalytic subunits, has the property of intrinsic uncoupling [93], through which it regulates and increases the coupling efficiency to produce ATP or heat. This mechanism is consistent with the negative selection observed for this gene, as adaptations can occur by changing the configuration of the protein, a mechanism that does not occur in complexes I and V. Further studies are required to completely understand the factors constraining the evolutionary rate of mitogenomes in penguins. Evolution of protein sequences depends on functional constraints including differential gene expression, protein folding and connectivity, functional importance, and expression breadth among tissues [94]. Here, we discussed the implications of functionality of different mtDNA genes on the evolutionary rate of a protein. Alternatively, Zang and Yang [94], suggest that this rate may be predominantly influenced by its expression level rather than functional importance.

Mitogenomics and ecological-evolutionary hypotheses to explain selection
There are two hypotheses generally offered to explain patterns of mitochondrial genome evolution. The first is that temperature is a primary driver of selection and the evolution of mtDNA genes. For penguins, speciation and adaptation are seemingly strongly correlated with temperature differences of ocean water between the Antarctic and the tropics [35][36][37][38]. The mitochondrial OXPHOS generates both energy and heat, with the balance of each depending on the coupling efficiency of the electron transport chain to the ATP synthase activity [2,3]. Less efficient energy production and more heat production would be beneficial for survival in cold environmental conditions [2]. A reduced proton leak would lead to incremental cold sensitivity that effectively constrains the biogeographic distribution of species with this pattern of mtDNA variation [4,5]. Selection on mtDNA variation influencing mitochondrial bioenergetics would thus be plausible drivers of adaptation and speciation [6], and indeed we detected correlation between Ka/Ks from ND4 and sea suface water temperature which should be further investigated.
A second hypothesis is that the risk of starvation associated with restricted food availability (less abundant or less reliable food source) drives mtDNA gene evolution, since starvation can generate more efficient coupling of energy production by the OXPHOS pathway [7]. Three mechanisms have been suggested for the evolution of starvation resistance and each may be influenced by mitochondrial diversity: (a) greater energy reserves (lipid stores); (b) reduced rate at which reserves are utilized; (c) and lowering of the minimal resources required for survival [7,9]. Reducing energy requirements may occur by reducing ROS and heat production and thus increasing OXPHOS efficiency [8,10]. Chl can be considered to be a measure or correlate of primary productivity and therefore resource availability in the Antarctic marine food web. Penguins as top predators, are dependent on the productivity of lower trophic levels. Therefore, the high variation in productivity of the habitats throughout the Spheniscus species distributional ranges may have led to the evolution of more efficient coupling in those species, compared to the stable environments of Emperor and Adélie penguins. For example, high variability in levels of Chl is observed within the range of S. humboldti, and becomes further exacerbated during El Niño events, leading to a higher probability of mortality due to starvation. In this circumstance, selection may favor OXPHOS efficiency rather than heat production. During El Niño events, the upwelling of nutrients is interrupted, reducing food availability and increasing water temperature in Chilean-Peruvian waters [95]. This affects fish abundance, causing increased Galapagos and Humboldt penguin mortality, nest desertion and large-scale movements [40,42,96]. Although Chl could be an important variable for mtDNA evolution, only a weak correlation between Ka/Ks ratio of COXI and Chl across small geographic scales was detected here. Therefore, our results should be further investigated in penguins not only at a species level, but also in populations of each penguin species across a gradient of sea surface temperatures (e.g. across latitudes) and chlorophyll levels.
Although ATP8 was not correlated to SST and Chl here, the Ka/Ks ratio values were higher than in any other gene compared. Moreover, it showed a remarkable pattern for all comparisons among species distributed in extremely different environments (tropical and hot temperate climate versus polar climate). We found contrasting results for ATP8. A. forsteri and Spheniscus had the lowest values for ATP8 genes in pairwise comparisons among all genera. This may be indicative of different mechanisms of adaptation among genes and species depending on their individual habitat requirements. However, we cannot discount the possibility that this gene is associated with other selective variables, which in turn are correlated with climatic and productive variations observed in the distributional ranges of penguins. Indeed, it was recently suggested that evolution of mitochondrial genes might be linked with pathogen variability [11].
An increasing number of studies have implicated mitochondria with innate immune responses, including the control of apoptotic signaling, sterile inflammation, and antibacterial and antiviral response [12][13][14][15]. Although it is uncertain what direct role mitochondrial encoded genes have in immune response, it bears further evaluation. Antarctica and the tropics differ considerably in virulence, ensemble, abundance, development and diversity of pathogens. Just recently, the influenza virus was detected in penguins in Antarctica [97][98][99]. Antarctic wildlife is particularly susceptible to infectious agents because of its geographic isolation and extreme environment, which increases the likelihood that the fauna has evolved under low "pathogen pressure" and are thus immunologically naïve [11].

Implications for future adaptation to changing environments
Ongoing human-induced climatic changes are predicted to have large impacts on the world biota, including penguin species [44][45][46]100]. Pygoscelis species appear to be changing their distributions. P. papua populations have been increasing along the Antarctic Peninsula in association with an abrupt increase of water temperatures, while populations of P. adeliae and P. antarcticus have been decreasing [44][45][46]. The decline and disappearance of an A. forsteri colony was associated with the rise in local mean annual air temperature and decline in seasonal sea ice duration [101]. S. magellanicus populations have experienced increasing reproductive failure, measured in chicks' mortality associated with starvation, rain, heat, and predation, which has been associated with climate change and an increase in frequency and intensity of storms [102]. S. humboldti and S. mendiculus have been historically affected by strong El Niño events causing high mortality, nesting delay, nest flooding and breeding failure [40-42, 96, 103]. El Niño events are becoming more intense and frequent due to global climate change [104,105]. The decreasing food availability for penguins in South America during El Niño events [95], and for penguins in Antarctica during the past 50 years [106] has caused not only direct mortality. Increased mortality has also been attributed to compromised immune systems, since adequate nutrition is fundamental to maintaining a functioning immune system [107]. The expanding human activities in Antarctica also increase the risk of introduction of infectious diseases into this naïve wildlife community [108]. Therefore, selective pressures on mtDNA genes by environmental factors such as temperature, food availability and novel pathogens could intensify significantly with climate change.
The effect of climate change on gene evolution does not have to be direct. As penguins are top marine predators, changes at all trophic levels will affect penguin population demography [109]. However, understanding the genes involved in adaptation to climate by penguin species may permit us to predict how the species are physiologically limited and how they may be able to respond to environmental changes such as climate warming.