Aneides phylogeny estimated from nuclear genes
Despite recent work on salamander phylogenies [38, 39, 52, 53], the phylogenetic relationships within Aneides were previously unknown. From our analysis of 2046 bp from three nuclear genes (BDNF, POMC, and RAG1), we estimated a resolved phylogeny for the genus that includes all six species represented by several divergent lineages (Fig. 1). Our results show A. aeneus to be sister to the rest of the clade; A. aeneus is found in eastern North America, while the remaining five species are only found in western North America. The two Pacific Northwest species — A. ferreus and A. vagrans — are sister taxa, which together form the clade sister to A. flavipunctatus. Aneides flavipunctatus + A. ferreus + A. vagrans is sister to A. lugubris; these four species are found in the coastal regions of Western North America. Aneides hardii, which is restricted to a few isolated mountaintops in southern New Mexico, is sister to the coastal clade. Our results are consistent with the relationships suggested by other studies that included some species of Aneides [32, 38, 39, 53].
Mitochondrial gene order
We identified mitochondrial gene orders for nine Aneides samples using shotgun sequence data. The total amount of sequence and average read length varied among samples, but overall, the datasets represent ~ 0.5–1.5% of the nuclear genomes at 1× coverage (Table 2). We produced nearly complete mitochondrial genome sequences from a single contig for two samples. For each of the remaining samples, we produced four to eight contigs that represent large fractions of the mitochondrial genome. We eliminated several assembly gaps using PCR; however, several regions of the genome remain absent from our assemblies. Specifically, the control region and the region between CYTB and ND6 were difficult to assemble, likely reflecting the highly repetitive nature of these two regions [19]. Despite these gaps, our contigs include all 37 mitochondrial genes for each individual; thus, we were able to infer gene order from all nine individuals with confidence. We combined these data with previously published genomes from two additional Aneides individuals (one A. hardii and one A. flavipunctatus) for a total of 11 focal Aneides mitochondrial genome sequences (Table 2).
We found evidence of gene rearrangement, relative to the typical vertebrate mitochondrial gene order, in all 11 mitochondrial genome sequences. The mitochondrial gene order in A. aeneus, A. lugubris, A. vagrans, A. ferreus, all sampled lineages of A. flavipunctatus, and a subset of the sampled lineages of A. hardii includes a gene order exchange between ND6 and CYTB as well as the accumulation of repetitive, non-coding DNA in this region (Fig. 2a). This pattern is consistent with (1) an initial duplication of ND6, trn-E, CYTB, trn-T, an intergenic spacer, and trn-P, followed by (2) complete excision of one copy of ND6 + trn-E and decay of the remaining sequences to either recognizable pseudogenes (i.e. ψtrn-P) or unrecognizable tandem repeats [19]. Based on these results, we infer that this gene rearrangement is a synapomorphy for Aneides that occurred at the base of the clade (Fig. 1). Divergence date estimates for the split between Aneides and its sister taxon (i.e. Desmognathus + Phaeognathus) range from 36 to 69 mya, providing maximum and minimum dates for this duplication [38]. Thus, non-coding DNA resulting from the gene duplication at the base of Aneides has persisted in the genome of the descendant lineages for tens of millions of years.
Gene order is polymorphic within A. hardii
Of the four new A. hardii mitochondrial genomes we sequenced for this study, two individuals have the gene order we show here to be ancestral for Aneides (Fig. 2a). The other two individuals have the gene order previously reported for this species that reflects two duplication events [19] (Fig. 2b). Thus, within A. hardii, we identified a recent gene rearrangement where duplicates and intermediate stages of paralog loss are polymorphic within the species. Although our sampling found the gene orders in geographically isolated populations, our sample size is too small to infer that the populations are fixed for the difference in gene order.
Patterns of gene loss following duplication in A. hardii
Since the duplication event occurred within the A. hardii lineage, paralogous sequences within A. hardii genomes have begun to diverge from one another, accumulating point substitutions, deletions, and, to a lesser extent, insertions. Here, we summarize patterns of divergence between duplicates for different types of sequences.
Protein-coding genes
All of ND6 and at least some portion of ND1 were included in the A. hardii-specific duplication (Fig. 2a). The functional copy of ND6 in A. hardii is 519 bp long. In contrast, the putative pseudogene copy of ND6 we identified in RAC25 is 521 bp long, reflecting one insertion of a total length of two bp. We were unable to assemble this region of the genome in RAC20. Sequence identity between ND6 and ψND6 is 97%. ψND6 differs from the functional copy by ten point substitutions and two insertions, which result in several amino acid replacements to stop codons. The longest predicted ORF within ψND6 is 139 amino acids long.
The functional copy of ND1 in A. hardii is 960 bp. ψND1 is a 676 and 685 bp fragment that shares 91.7% and 94.6% sequence identity with the first 682 bp of the functional ND1 in RAC25 and RAC20, respectively. Specifically, ψND1 differs from the functional copy in RAC20 by 30 point substitutions, a 2 bp deletion, and two insertions (5 bp total), which result in several amino acid replacements to stop codons. Similarly, ψND1 differs from the functional copy in RAC25 by 42 point substitutions, four deletions (11 bp total), and two insertions (5 bp total), resulting in several amino acid replacements to stop codons. The longest predicted ORF within ψND1 is 105 amino acids long in RAC25 and 205 amino acids in RAC20. The ψND1 sequences are 94.7% identical between the two individuals. Because we are unable to confirm the total amount of ND1 that was included in the initial duplication, we are unable to determine the total amount of deleted sequence in ψND1. If ND1 were only partially duplicated, it would have become a pseudogene immediately because of its shortened length.
rRNA genes
Both ribosomal RNA genes were included in the A. hardii–specific duplication. Duplicate copies of both rRNAs, as well as the intervening tRNA-V, have been almost completely deleted from genomes with the A. hardii-specific duplication. The functional copies of these three genes together total 2520 bp of contiguous sequence; however, only a 111-bp-long segment of unrecognizable sequence remains of the duplicates.
tRNA genes
tRNA-E, -P, -F, -V, and -L were included in the A. hardii-specific duplication. With the exception of one tRNA, we were able to identify all A. hardii-specific duplicated tRNA gene copies (except tRNA-V, located between the two rRNAs), which appear to largely remain functional based on intact stem-loop structures and the anticodon sequences. Both individuals with the A. hardii-specific duplication retain identical copies of tRNA-P and likely tRNA-E as well (although we were unable to identify a second copy in RAC20). tRNA-L paralog sequences differed by two single-basepair substitutions (one shared, one unique) in both individuals and share 97.7% sequence identity, even though each tRNA sequence is unique. For individuals with the A. hardii-specific duplication, tRNA-F sequences upstream of the intact rRNA genes are identical, while tRNA-F sequences adjacent to the pseudogenized rRNA genes are more variable and likely nonfunctional. The latter tRNA-F differs from the former by three bases (in RAC20) and five bases (in RAC25), with only two differences being shared by both individuals. The relatively short length of tRNA genes (~70 bp) limits the likelihood of accumulating substitutions and indels within these genes, which would allow for the conservation and functional persistence of tRNAs over other duplicated genes of greater length.
Non-coding sequences
Two regions of non-coding, repetitive DNA remain in all Aneides mitochondrial genomes from the ancestral duplication at the base of the clade. These regions are predicted to be the remnants of (1) a copy of CYTB + tRNA-T + IGS and (2) a copy of IGS + tRNA-P. Both were included in the A. hardii-specific duplication, and at least a portion of both sequences is retained in duplicate following the A. hardii duplication. We were unable to fully resolve these regions of the genome because they are repetitive and could not be assembled with confidence. Average sequencing read depth of these regions is at least double that of the rest of the genome, suggesting that these regions occur at least twice as often.
Taken together, our results show that, following the A. hardii-specific duplication event, different duplicate sequences have had dramatically different fates. Ribosomal gene copies have been almost completely deleted from the genome, and the sequence remaining bears no identity to the functional rRNA copies. In contrast, duplicate protein-coding sequences have not been deleted; specifically, ψND6 is now longer than the functional paralog, and ψND1 does not show evidence of substantial deletion. Both duplicated protein-coding sequences have decayed to pseudogenes and now encode short, presumably non-functional ORFs. Duplicate tRNAs (except trn-V) also show no evidence of deletion; tRNA-E, -P, and -L remain apparently functional, retaining 98–100% sequence similarity between the two paralogs.
Loss of duplicate sequences in A. hardii occurred over a shallow evolutionary timescale
We measured the number of substitutions between the A. hardii mitochondrial genomes that (1) retained the ancestral Aneides gene order, and (2) underwent the A. hardii-specific duplication. The average number of nucleotide substitutions per site between the two different mitochondrial gene orders found within A. hardii was low (D
XY
= 0.024). This result is congruent with our nuclear gene phylogenetic analysis, which shows very low genetic divergence among A. hardii individuals (Fig. 1). Thus, all of the duplicate gene sequence decay/loss we identify in the A. hardii-specific duplication has occurred in the time taken to accumulate only 2.4% sequence divergence from the ancestral Aneides mitochondrial genomes.
What evolutionary forces may have shaped duplicate gene loss in A. hardii?
We have demonstrated that two mitochondrial gene duplication events occurred during the history of the genus Aneides — one tens of millions of years ago, and another much more recently within a single species. We have shown that different duplicate sequences have had very different fates: ribosomal gene copies have been almost completely deleted from rearranged Aneides hardii genomes, whereas non-coding sequence from the duplication event that occurred at the base of the clade persists in several species of Aneides. What do these patterns of paralogous sequence evolution suggest about the evolutionary forces that shape duplicate gene loss in mitochondria?
The evolutionary history of the mitochondrial genome, from its inception as a bacterial endosymbiont to its emergence as an essential organellar genome, includes massive genomic reduction [54,55,56]. Consistent with this overall trend, it has been hypothesized that selection for efficiency of mitochondrial replication and transcription would favor the elimination of extraneous sequence. Additionally, the accumulation of noncoding DNA can pose a mutational hazard [3,4,5, 57]. In Aneides, we do not see evidence that deletions of extraneous sequence are frequent targets of strong positive selection. Rather, we see (1) non-coding sequences persisting/expanding in length in all species of Aneides that originated from a duplication event tens of millions of years ago, (2) non-coding sequences (i.e. repeat sequences and pseudo-protein-coding genes) persisting in the A. hardii genomes that underwent a duplication event much more recently, and (3) full-length duplicate tRNAs persisting in A. hardii genomes, all of which suggest that these genomes do not rapidly purge duplicated regions. These patterns have at least two possible (somewhat overlapping) interpretations. First, it is possible that duplicate sequences do not strongly negatively impact mitochondrial function through their effects on rates of duplication and transcription or their mutational hazard. Accordingly, insertions and deletions in these sequences reflect the underlying mutational spectrum, and large deletions are relatively rare. Second, it is possible that large deletions may have enough of an impact on duplication/transcription/harmful mutation rate to be targets of positive selection, but that such large deletions are sufficiently rare, relative to substitutions and smaller insertions and deletions, that duplicate sequences persist in Aneides genomes, despite some negative fitness consequences.
In contrast to the persistence of duplicate tRNAs and pseudogenized protein-coding genes, as well as the persistence/expansion of repetitive sequence, duplicate ribosomal RNA genes have been almost completely deleted since the recent A. hardii duplication event. This suggests that selection may have favored deletion of these sequences. Although the presence of additional identical rRNA gene copies in the mitochondrial genome may be unlikely to have deleterious fitness consequences, the presence of pseudo-rRNA genes that compromise protein translation could negatively impact fitness. Thus, we hypothesize that (1) duplicate rRNA genes sustained mutations that compromised translational capacity, and (2) deletions eliminating such sequences were then favored by selection. With our data, however, we cannot exclude the alternate possibilities that a large deletion eliminating the duplicate rRNA genes was fixed by genetic drift or by selection acting on duplication/transcription. Characterizing the evolutionary fate of other recent rRNA gene duplications would aid in discriminating among these hypotheses.