The adaptive evolution of the mammalian mitochondrial genome
© da Fonseca et al. 2008
Received: 01 September 2007
Accepted: 04 March 2008
Published: 04 March 2008
Skip to main content
© da Fonseca et al. 2008
Received: 01 September 2007
Accepted: 04 March 2008
Published: 04 March 2008
The mitochondria produce up to 95% of a eukaryotic cell's energy through oxidative phosphorylation. The proteins involved in this vital process are under high functional constraints. However, metabolic requirements vary across species, potentially modifying selective pressures. We evaluate the adaptive evolution of 12 protein-coding mitochondrial genes in 41 placental mammalian species by assessing amino acid sequence variation and exploring the functional implications of observed variation in secondary and tertiary protein structures.
Wide variation in the properties of amino acids were observed at functionally important regions of cytochrome b in species with more-specialized metabolic requirements (such as adaptation to low energy diet or large body size, such as in elephant, dugong, sloth, and pangolin, and adaptation to unusual oxygen requirements, for example diving in cetaceans, flying in bats, and living at high altitudes in alpacas). Signatures of adaptive variation in the NADH dehydrogenase complex were restricted to the loop regions of the transmembrane units which likely function as protons pumps. Evidence of adaptive variation in the cytochrome c oxidase complex was observed mostly at the interface between the mitochondrial and nuclear-encoded subunits, perhaps evidence of co-evolution. The ATP8 subunit, which has an important role in the assembly of F0, exhibited the highest signal of adaptive variation. ATP6, which has an essential role in rotor performance, showed a high adaptive variation in predicted loop areas.
Our study provides insight into the adaptive evolution of the mtDNA genome in mammals and its implications for the molecular mechanism of oxidative phosphorylation. We present a framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of the mtDNA encoded proteins involved in oxidative phosphorylation.
Mitochondrial DNA (mtDNA) has long been treated as an ideal marker because of its convenience for reconstruction of gene genealogy and population history inference. However, the selective neutrality assumption of the mtDNA is simplistic because variation in mitochondrial protein-coding genes involved in oxidative phosphorylation (responsible for the production of up to 95% of the energy of eukaryotic cells) can directly influence metabolic performance. Because of the importance of this biochemical pathway, evaluating selective pressures acting on mtDNA proteins could provide key insight into the adaptive evolution of the mtDNA genome as has been suggested by recent empirical evidence (e.g. Ruiz-Pesini et al. 2004 , Moyer et al. 2005 , Bazin et al. 2006 ). As amino acid changes cause inefficiencies in the electron transfer chain system, oxidative phosphorylation produces reactive oxygen molecules, causing oxidative damage to mitochondrial and cellular proteins, lipids and nucleic acids, and eventually interrupting the production of mitochondrial energy.
Examples of mutations in mitochondrial genes that cause exercise intolerance in humans.
First N-terminal membrane-spanning region of COXII: a structural association of COXII with COXI is necessary to stabilize the binding of heme a3 to COXI.
Loss of the last 13 amino acids of the highly conserved C-terminal region of this subunit.
Qo binding pocket
Heme bh and Qi binding pocket
Disrupts helix cd1, Qo site
Disrupts helix A, Qi pockect
The 41 mammalian species in study.
Cape Golden Mole
cape rock hyrax
savanna or grassland; scrub forest.
Syria south through NE Africa through most of sub-Saharan Africa. Isolated mountains in Libya and Algeria.
African savana elephant
Sahara Desert to the south tip of Africa, from the Atlantic (western) coast of Africa to the Indian Ocean in the east
high; 3670000 (from Elephas maximus)
Elephantulus sp. VB001
savannas, deserts, thornbush and tropical forests
low; 51 ( Elephantus AVE)
short-eared elephant shrew
desert and semi-desert areas
invertebrates and herbivorous diet
Namibia, southern Botswana, and South Africa
tropical marine coastal water
east Africa, northern coast of Australia, island groups of the South Pacific
small Madagascar hedgehog
dry forests, scrub, cultivated areas, dry coastal regions and semidesert
invertebrates; also baby mice
dry savanna to rain forest
Malayan flying lemur
Thailand and Indochina
savanna or grassland; forest
areas of broken rock and talus; taiga; mountains
central British Columbia to south-central California and east to Colorado
domestic guinea pig
greater cane rat
along river banks and near marshes
KwaZulu-Natal; Gauteng and the Northern Province; Mpumalanga
originally distributed from the Mediterranean region to China; spread throughout the world
temperate; tropical; desert; savanna or grassland; chaparral; forests; mountains
native to northern China; can be found on every continent of the world except Antarctica
Eurasian red squirrel
Europe and northern Asia
medium; 532 ( Sciurus AVE)
northern tree shrew
medium; 123 (from T. glis)
medium; 8860 ( Canis AVE)
high; 13200 (Felidae AVE)
warm tropical waters to arctic waters
carnivores (crustaceans, plankton, and small fish)
North Pacific Ocean, North Atlantic Ocean, southern Hemisphere
carnivore (fish, squid, octopus and small crustaceans)
Icelandic waters and in the North Sea.
shallow water; can live in cold climates (but not frozen water)
altitude of 3500 to 5000 meters above sea-level
Ryukyu flying fox
medium; 492 ( Pteropus AVE)
humid dark roosts
very ripe fruit
Africa, Egypt to Turkey, Cyprus, Arabian peninsula east to Pakistan
Jamaican fruit-eating bat
herbivore; also insects
central Mexico to Bolivia and central Brazil through the Greater and Lesser Antilles
New Zealand long-tailed bat
Low; 18 (from C. gouldii)
western European hedgehog
savanna or grassland; forest
region, except the Himalayas and North Africa
wet grasslands to montane forests
seeds, insects, nuts, worms
along the Pacific coastline of Siberia
savanna or grassland; forest
throughout temperate Europe to east in Russia
savanna or grassland; chaparral ; forest
bulk of their diet is herbivorous; aquatic organisms
Uganda to Senegal and Angola
savanna or grassland; forest
invertebrates; occasionally birds, small mammals, fruits
Peru and northern Argentina to the south-central and southeastern United States. It is also found on the islands of Grenada, Trinidad and Tobago
southern two-toed sloth
Central America and northern South America
medium; 3770 (from C. hoffmanni )
savanna or grassland; forests; at elevations to 2000 m
Another region that shows radical amino acid variation is helix cd2, close to the hinge region of ISP (Figure 9). Mutations in the hinge region were shown to have drastic consequences in the catalytic activity of ISP, by hindering the conformational changes that are required for cytochrome bc 1 function . Two species have a proline residue on site 158: the greater cane rat, a rodent with spiny fur on the back, and the pangolin, the scaly-anteater, which has a low metabolic rate because of the combination of an invertebrate diet and a large body size . Sites 16, 159, 162 and 263 show an elevated number of amino acid changes (Figure 10), suggestive of adaptive relevance in many mammalian species. Site 277 presents a highly conserved alanine residue. It is placed in the middle of helix F1, within the Q0 pocket (Figure 9). Two peculiar species (dugong and alpaca) with distinct metabolic requirements show very radical amino acid changes at this site. The dugong is an aquatic mammal that is more closely related to elephants than to other marine mammals . It is sometimes referred to as a sea cow because of its strict sea-grass diet, combining several interesting features from the metabolic point of view: a large body size, a low energy diet and aquatic environment adaptation. It has an arginine residue in position 277, which will not only cause extra steric hindrance because of the size of the side chain relative to alanine (in Figure 9 the van der Waals surface of an arginine side chain is presented; it clearly overlaps the stigmatellin binding position), but will also change the binding mode of the ligand, as it is positively charged. Finally, an important change was detected in the alpaca, a domesticated breed of South American camel-like ungulates that lives at an altitude of 3500 to 5000 meters above sea-level presenting metabolic adaptations to the low O2 environment . This species has a proline at site 277, which will drastically alter the local secondary structure, disrupting the alpha helix and therefore changing the shape of the Q0 pocket. Such a mutation is not present in the closest relatives of the alpaca [see Additional file 1, Fig. S5]. Curiously, the Old World members of the Camelidae family (the dromedary Camelus dromedarius and the bactrian camel C. bactrianus) have an aspartate in position 16, instead of the asparagine exhibited by the four species that inhabit South America (alpaca Lama pacos, guanaco L. guanicoe, llama L. glama, and vicuna Vicugna vicugna) [see Additional file 1, Fig. S5]. Several mutations in human CytB have been related to exercise intolerance  (Table 1; Figure 9), all of which have similar chemical effects including a change in the net charge around the heme groups and in the binding pockets and disruption of local secondary structures close to the substrate binding areas.
Variable amino acid sites located in COXI/II/III. These sites located on the mtDNA encoded subunits of COX (subunits I, II and III) show significant amino acid properties variation and are in contact with the nuclear encoded subunits of Complex IV (subunits IV, VB, VIA, VIB, VIIA, VIIB, VIIC and VIII).
38, 122, 155, 182, 185, 230
The ATP8 gene encodes a core subunit of the F0 component of ATPase. In its absence, the ATPase in yeast contains no ATP6 subunit, which suggests an important role in the assembly of F0 . Nevertheless, ATP8 subunit has some highly variable sites (Figure 12B), presenting the higher average of radically changing amino acid properties per residue, suggesting some variation of its regulatory role across species.
The variation between the metabolic rates in different species is a consequence of multiple factors, including the need to maintain body temperature, the number of mitochondria and the volume densities and/or cristae surface, and the fact that relative organ mass and organ metabolite rate varies interspecifically. For example, in reptiles, the lower metabolic rates, compared to mammals, are due to a combination of smaller internal organs, lower mitochondrial volume and cristae surface densities .
The scaling of metabolic rates is thus an intricate issue, and even recent multiple-cause models  are flawed . Adding to the complexity of interspecies metabolic rates analysis is the random accumulation of variation in the coding sequences of proteins directly involved in energy production and differential selective pressures that arise as mutations affecting mitochondrial ATP production.
We present a mammalian phylogeny based on variation in protein-coding mtDNA genes among 41 representative species. Sequence analyses were complemented with functional analyses to assess the potential importance of mutations leading to radical changes in the physicochemical properties of the amino acids. Most of the mtDNA protein-coding genes were extremely conserved, reflecting their vital role in oxidative phosphorylation. However, much of the observed variation had plausible adaptive significance.
The ND2, ND4, and ND5 complex I genes showed higher than average adaptive variation, with all of the variable sites located in the assessed loop regions of these putative protons pumps (3D structural data are needed to further confirm these interpretations and to measure the functional implications).
The available high resolution 3D structure of CytB facilitated interpretation of the functional implications of mutations occurring at portions of the protein which resulted in extreme amino acid properties variation in species with peculiar metabolic requirements (such as adaptation to low energy diet vs large body size, namely in elephant, dugong, sloth, and pangolin; and adaptation to extreme O2 requirements, i.e. diving in cetaceans, flying in bats, and high altitudes resistance in alpacas). The adaptive variation in COX was restricted mostly to the interface between mitochondrial and nuclear-encoded subunits, suggesting either co-evolution or some influence in the regulatory role of the latter. Among the ATPase subunits, ATP8 which has an important role in the assembly of F0, showed the highest amount of adaptive variation in this analysis. ATP6, which has an essential role in the ATPase rotor performance, showed a high adaptive variation in predicted loop areas. Interpretation of possible functional roles of these changes is limited, however, by the lack of experimental and structural data for these genes.
Our study provides insight into the adaptive evolution of the mtDNA genome in mammals, which may have facilitated the successful radiation and diversification of mammalian species into different environments and habits. The evidence of positive selection acting in important functional regions of the various mammalian mtDNA proteins provides the framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of the mtDNA encoded proteins involved in the oxidative phosphorylation.
A mammalian mitogenomic phylogeny was constructed using 12 of the 13 protein-coding genes of the mtDNA genome of 41 species representative of all mammalian orders (Table 2). The ND6 gene was excluded because it is encoded by the light-strand which has a significantly different base composition from the heavy-chain . Gaps and ambiguous sites adjacent to gaps were removed, resulting in a total alignment of 10,587 nucleotides (3,529 amino acids). The third codon position was excluded from the phylogenetic analysis (7,058 nucleotides were used) because of observed nucleotide saturation [see Additional file 1, Fig. S1].
Bayesian inference methods with Markov chain Monte Carlo (MCMC) sampling were used in MrBayes [25, 26] to assess phylogenetic relationships among the species. We used a General-Time-Reversible substitution model  with the invariant site plus gamma options (five categories) after determining the optimal model of sequence substitution with MrModeltest 2.2 . One cold and four incrementally heated chains were run for 2,000,000 generations with chains I = 2, 3, 4, and 5 incrementally heated with heat being 1/(1+ [i-1]T) and T = 0.2. Trees were sampled every 100 generations from the last 1,000,000 generated (well after the chain reached stationarity) and 10,000 trees were used for inferring Bayesian posterior probability. The burn-in fraction performance was evaluated using the program Tracer v1.4 http://tree.bio.ed.ac.uk/software/tracer/. Bayesian methods have been successfully applied to estimation of the tree topology of placental animals using both mitochondrial and nuclear data . A maximum likelihood phylogenetic tree was constructed in PAUP 4.0b10  after determining the optimal model of sequence substitution (TVM+I+G) with Modeltest 3.04 .
Selection in protein-coding genes is generally assessed by estimating ω, the ratio between nonsynonymous and synonymous substitution rates (d N/d S) . However, this statistical approach for detecting molecular adaptation is largely biased against even moderately conservative proteins as it does not allow the possibility that adaptation may come in the form of very few amino acid changes. Thus, significant physicochemical amino acid changes among residues in mitochondrial protein coding genes were identified by the algorithm implemented in TreeSAAP , which compares the observed distribution of physicochemical changes inferred from a phylogenetic tree with an expected distribution based on the assumption of completely random amino acid replacement expected under the condition of selective neutrality. The evaluation of the magnitude of property change at nonsynonymous residues and their location on a protein 3D-strcuture may provide important insight into the structural and functional consequences of the substitutions . Eight magnitude categories (1 to 8) represent one-step nucleotide changes in a codon and rank the correspondent variation in a property scale of the coded amino acid. Categories 1 to 3 indicate small variation in the amino acid characteristics while categories 6 to 8 represent the most radical substitutions. By accounting for the property changes across the data set, a set of relative frequencies changes for each category is obtained allowing to test the null hypothesis under the assumption of neutral conditions . The categories for which the observed numbers of amino acid replacements in the data set is significantly different from the null model (z-scores > 1.645; P < 0.05) are considered as being potentially affected by selective pressures . Here we focus on amino acid differences that correspond to radical physicochemical variation (positive-destabilizing selection) and are expected to be linked with significant changes in function. TreeSAAP categorizes each amino acid site by positive and negatively destabilizing using 31 properties (henceforth amino acid positions will be referred as sites). To detect strong directional selective pressure, only changes corresponding to categories 7 and 8 (the 2 most radical property changes categories) at the P ≤ 0.001 level were considered. The total number of changes per site is the sum of those occurring in each branch of the phylogeny. The number of changes in amino acid properties was standardized relatively to the overall size of the protein when comparing different complexes (weight factor = total number of amino acids in the complex/total number of amino acids in ATPase, which is the smallest protein complex).
The functional relevance of the amino acid mutations was discussed in the context of existing three-dimensional (3D) structures of mtDNA encoded proteins (CytB [PDB:1PPJ] ; COX [PDB:1V54] ). For those proteins with unknown 3D structures (ND and ATPase), topologies for transmembrane (TM) subunits were predicted using hidden Markov model (HMM) based servers for topology prediction of transmembrane proteins [68–71]. The algorithm used by the program PRODIV-TMHMM  has proven to be very reliable at predicting 3D topologies as it incorporates evolutionary information from multiple sequence alignments and assigns amino acid residues to different TM regions according to their properties. However, since even homologous sequences from the same protein family can have inverted topologies , some caution is necessary when using topologies predicted by these automated approaches. We have therefore delineated putative TM domains by integrating the results from PRODIV-TMHMM with two other reliable HMM based methods , TMHMM [68, 69] and HMMTOP . Graphic representations of the 3D structures were created with the program VMD .
nicotinamide adenine dinucleotide
cytochrome c oxidase complex
This work was supported in part by the Project POCTI/BSE/47559/2002 and PTDC/BIA-BDE/69144/2006 from the Portuguese Foundation for Science and Technology (FCT) and by the National Geographic Society Grant 7483-03. RF is funded by FCT fellowship SFRH/BPD/26769/2006. This project has been funded in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. Comments made by three anonymous referees improved a previous version of this manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.