Mapping 3′ transcript ends in the bank vole (Clethrionomys glareolus) mitochondrial genome with RNA-Seq
© Marková et al. 2015
Received: 24 June 2015
Accepted: 16 October 2015
Published: 26 October 2015
Although posttranscriptional modification of mitochondrial (mt) transcripts plays key roles in completion of the coding information and in the expression of mtDNA-encoded genes, there is little experimental evidence on the polyadenylation status and the location of mt gene poly(A) sites for non-human mammals.
Poly(A)-enriched RNA-Seq reads collected for two wild-caught bank voles (Clethrionomys glareolus) were mapped to the complete mitochondrial genome of that species. Transcript polyadenylation was detected as unmapped adenine residues at the ends of the mapped reads. Where the tRNA punctuation model applied, there was the expected polyadenylation, except for the nad5 transcript, whose polyadenylated 3′ end is at an intergenic sequence/cytochrome b boundary. As in human, two pairs of bank vole genes, nad4l/nad4 and atp8/atp6, are expressed from bicistronic transcripts. TAA stop codons of four bank vole protein-coding genes (nad1, atp6, cox3 and nad4) are incompletely encoded in the DNA and are completed by polyadenylation. This is three genes (nad2, nad3 and cob) less than in human. The bank vole nad2 gene encodes a full stop codon (TAA in one vole and TAG in the other), which is followed by a 2 bp UTR and the gene conforms to the tRNA punctuation model. In contrast, the annotations of the reference mouse and some other rodent mt genomes in GenBank include complete TAG stop codons in both nad1 and nad2, which overlap downstream trnI and trnW, respectively. Thus the RNA-Seq data of bank voles provides a model for stop codons of mt-encoded genes in mammals comparable to humans, but at odds with some of the interpretation based purely on genomic data in mouse and other rodents.
This work demonstrates how RNA-Seq data were useful to recover mtDNA transcriptome data in a non-model rodent and to shed more light on mammalian mtDNA transcriptome and post-transcriptional modification. Even though gene content and organisation of mtDNA are strongly conserved among mammals, annotations that neglect the transcriptome may be prone to errors in relation to the stop codons.
Despite the growing number of animal species with completely sequenced mitochondrial (mt) genomes, surprisingly little is known about the mt transcriptome, even for mammals. Several recent papers have reviewed the mammalian mt transcriptome based on human data [1–3], while for other mammals even the basic characteristics, including the location of the transcript ends or presence of untranslated regions (UTRs) and intergenic sequences (IGSs), have not yet been described. This is surprising given that the mt genome sequence of the mouse (the first non-human mammal sequenced) has been known for over 30 years .
The mt genome of mammals typically consists of ~16.5 kb circular, double-stranded DNA that contains 37 genes coding for 13 protein subunits of the oxidative phosphorylation (OXPHOS) system, two ribosomal RNAs (rRNAs; rrnS and rrnL; the gene nomenclature follows that proposed by Boore , and adopted by e.g. Bernt ), and 22 transfer RNAs (tRNAs). The individual genes are unequally distributed between the heavy (H) and light (L) strands, with the majority of the genes encoded on the H strand, except for the nad6 gene and eight of tRNAs.
Each strand of human mtDNA has its own promoter(s) and both strands are transcribed as polygenic or polycistronic precursor transcription units, which are processed to release tRNAs, rRNAs and mRNAs . There are two H-strand promoter/transcription initiation sites, one located in the control region (HSP1) and one near the 5′ end of rrnS (HSP2), and one L-strand promoter located in the control region (LSP) . Both HSP1 and HSP2 initiate transcription in the same direction, but the transcription from HSP2 produces a full-length transcript covering the two rRNAs, 12 mRNAs and 13 tRNAs, while the transcription from HSP1 terminates downstream of rrnL, transcribing only trnF, rrnS, trnV and rrnL [7, 8]. The existence of two distinct promoters is probably the basis for the differential regulation of synthesis of the rRNAs and H-strand encoded mRNAs, with the transcription from HSP1 being responsible for the synthesis of the majority of rRNA . Similar to HSP2, the transcription initiated at LSP covers a large portion of the mt genome and includes the one mRNA (nad6 gene) and the eight tRNAs encoded on the L strand .
Most of cleavage events required to release the mRNAs, tRNAs and rRNAs from these precursor transcripts can be accounted for by the ‘tRNA punctuation’ model . According to the model, tRNAs are endonucleolytically excised from the precursor transcripts and the fragments that remain are processed to functional mRNA and rRNA transcripts. The 5′ and 3′ ends of the mRNA and rRNA transcripts are therefore defined by the endpoints of the intervening tRNA sequences . This model ensures that adjacent mRNA and tRNA genes do not overlap or a tRNA excision would result in truncated mRNA (similar arguments apply for rRNA genes). A striking, well-demonstrated feature of human mt transcription in which it deviates from the tRNA punctuation model is that two transcription units remain as bicistronic elements containing overlapping open reading frames (ORFs) encoding atp8 and atp6, and nad4l and nad4, respectively . Such overlapping ORFs are present in mtDNA sequences of other mammals too, but a presence of mature mt bicistronic mRNAs in mammals other than human remains to be demonstrated.
Transcript polyadenylation sites
Posttranscriptional stop-codon completion
Rather than a complete stop codon in the DNA, three of the PCGs (nad1, cox3 and nad4) have a solitary T residue before the 5′end of a downstream tRNA, and in the case of the atp6 gene, a TA before the 5′end of the downstream adjacent cox3 gene (Fig. 5). In all these genes, the poly(A) site was located immediately following the last DNA-encoded nucleotide of the stop codon, i.e. T in the case of nad1 (Fig. 3), cox3 and nad4, and A in the case of atp6. Therefore, in these four genes a TAA stop codon was created by post-transcriptional polyadenylation that changed T or TA residues to the complete TAA stop codon (Fig. 5). A TAA stop codon thus terminates the translation of all mRNAs in the bank vole mtDNA. The exceptions are nad2 and nad3 terminating with TAA in the vole 1815, but with TAG in the vole 1634.
Bicistronic transcripts with overlapping reading frames
No poly(A) end-reads mapped to the 3′ ends of atp8, nad4l and nad6. The reading frames of the two H-strand genes (atp8 and nad4l) overlap the reading frames of the respective downstream genes in a tail-to-head manner by 43 bp (atp8/atp6) and 7 bp (nad4l/nad4) (Fig. 2). The reading frame overlap and the absence of a poly(A) tail from the more upstream gene in each pair (atp8, nad4l) most likely mean that in each case the two genes are expressed from the same, bicistronic transcript (atp8/atp6 and nad4l/nad4). This explains the absence of poly(A) tails downstream of the ORF of the more upstream gene in each case (atp8 and nad4l; Fig. 4). It also explains why atp8 and nad4l show no drop in coverage observed for the tRNA genes (Fig. 4), which is due to the library enrichment for poly(A)-tailed RNA.
Exception to the tRNA punctuation model
There is also no intervening tRNA sequence between the atp6 and cox3 genes (Figs. 2 and 5). The reading frames of neighbouring PCGs in the bank vole mt genome are interrupted with tRNA genes, with the exception of the two pairs of overlapping genes (atp8/atp6, nad4l/nad4; see above) and of the atp6 and cox3 genes, which are attached to each other in a tail-to-head manner (Fig. 2). However, unlike the more upstream genes associated with the two bicistronic transcripts (atp8 and nad4l), both atp6 and cox3 transcripts showed clear evidence of polyadenylation (Fig. 4), which completed the TAA stop codon of both these genes (Fig. 5). Therefore, the bank vole cox3 gene is expressed from a different mature mRNA than the atp8/atp6 transcript. These transcripts thus represent an apparent exception to the tRNA punctuation model in that another mechanism needs to be postulated to account for the cleavage site between atp6 and cox3.
Untranslated and intergenic regions
Mature transcripts of six PCGs contained a UTR at the 3′ end, defined as the region between the stop codon and the poly(A) site. Most 3′ UTRs were a single bp (C in nad3) or a few bp (AC in nad2, AT in cob and TAA in cox2) long. However, two monocistronic mRNA transcripts contained substantially longer 3′ UTRs (Fig. 5). First, the cox1 transcript contained a 69 bp long 3′UTR corresponding to the full antisense sequence of trnS followed by 3 bp of IGS sequence (AAT; Fig. 5; see below). Second, the 595 bp long nad5 3′UTR contained full antisense sequences of nad6 and trnE (nad5 and nad6 overlap tail-to-tail by 4 bp), and a 5 bp IGS (ATCTA; Figs. 4 and 5).
Intergenic sequences in the bank vole mitochondrial genome
Type of sequence
Origin of light-strand replication, OL
Although posttranscriptional modification of mt transcripts plays key roles in complementation of the coding information and in the expression of mtDNA-encoded genes, there is little experimental evidence on the polyadenylation status and the location of poly(A) sites in mtDNA for mammals other than humans [16, 17]. The present study starts to fill this gap by using RNA-Seq reads derived from poly(A)-enriched bank vole RNA to characterise the 3′ ends of the mt transcripts in this species.
The general characteristics of the bank vole mt transcriptome match the human model. All PCGs except one (nad6) are transcribed from the H-strand. The H-strand transcriptome consist of ten monocistronic and two bicistronic mRNAs. All bank vole transcripts punctuated by tRNA genes at their 3′ ends are polyadenylated (Fig. 4), as found for human mtDNA . Our data demonstrate that the polyadenylation of bank vole mt transcripts occurred precisely at the sites predicted by the tRNA punctuation model, except for the nad5 transcript, whose 3′ end is at an IGS/cob boundary (Fig. 5). This deviation from the tRNA punctuation model matches previous findings in humans . The two mature bicistronic transcripts correspond to the mRNA for atp8/atp6 subunits and the mRNA for nad4l/nad4 subunits, with ORFs overlapping by 43 and 7 bp, respectively. The expression of these genes from bicistronic units has been demonstrated in humans , and although the ORFs similarly overlap in mtDNA of other mammals [4, 19], only for the bank vole have we provided experimental evidence in support of the existence of mature bicistronic transcripts. This shows that this feature of mtDNA transcription is shared by rodents and primates.
Our RNA-Seq data provided no evidence for the location of the 3′ end of the mRNA for nad6. This is the only PGC encoded on the L-strand . The ORFs of bank vole nad5 and nad6 overlap tail-to-tail for 4 bp and the antisense sequence of nad6 is fully contained in the 3′ UTR of nad5 (Fig. 5). However, the bank vole nad6 ORF is not punctuated by a downstream tRNA and its mRNA does not appear to be stably polyadenylated near the 3′ end of the ORF, consistent with human [1, 18]. The location of the end of the bank vole nad6 transcript therefore remains unknown as does its polyadenylation status. Thus, while our approach allowed accurate positioning of the 3′ ends of major transcripts for all H-strand PCGs (Fig. 4), RNA extracted from enriched mitochondria and/or RT-PCR or deep sequencing strategies specifically targeting transcript ends will be required to describe the ends of the L-strand encoded transcript(s) as well as of rrnS and rrnL transcripts, and to test for possible additional minor poly(A) sites [2, 20].
Four of the bank vole PCGs (nad1, atp6, cox3 and nad4) have TAA stop codons that are incompletely encoded in the DNA and are completed posttranscriptionally by polyadenylation, which is demonstrated by the presence of a poly(A) site immediately downstream of the last DNA-encoded stop codon nucleotide (T or A; Fig. 5). This finding is in concordance with Slomovic et al.  who found in human cells that polyadenylation is a key step in completion of stop codons only partly encoded by mtDNA. Stop codons of seven PCGs in human mtDNA (including nad2, nad3 and cob) require completion by polyadenylation . Therefore, while the mechanism is shared by the two species, there is heterogeneity in the role of polyadenylation in completion of the coding information of the individual genes between human and the bank vole as some of the genes with incomplete stop codons in human encode full stop codons in the bank vole. For example, while the stop codon of nad1 is only partly encoded in the DNA (TA and T, respectively) and is posttranscriptionally completed to TAA in both species (Fig. 1) , nad2 encodes an incomplete stop codon (T) in human, but a full stop codon in the bank vole (TAA in one vole and TAG in the other), where it is further followed by 2 bp of UTR sequence (AC) (Fig. 5). The poly(A) site immediately follows this short 3′UTR (Fig. 5), which demonstrates that the bank vole nad2 mRNA also fully conforms to the tRNA punctuation model, but that in this species polyadenylation is not required to complete the stop codon.
In contrast to the evidence from human  and bank vole, annotations of various rodent mt genome sequences in GenBank (including the reference mouse mt genome) include complete TAG stop codons in the CDS of both nad1 and nad2 that overlap the downstream trnI and trnW by 2 bp (AG) in each case (Fig. 1). Such CDS/tRNA overlap would imply a deviation from the tRNA punctuation model  and an alternative RNA processing mechanism would need to be postulated. The discordance in nad1 and nad2 annotation among different species reflects insufficient transcriptome data. We believe that unambiguous determination of stop codons is, in some cases, not possible based solely on genomic sequence.
The present study is a demonstration of how deep RNA-Seq from total RNA was useful to recover mtDNA transcriptome data in a non-model rodent species. This approach has previously been applied to study mt gene expression of an insect, including polyadenylation and polycistronic transcripts . We show that RNA-Seq data are helpful in evaluation of gene boundaries, polyadenylation and post-transcriptional modification in mammalian mtDNA. Even though the structure, genetic content and organization of mtDNA are strongly conserved among mammals and human mtDNA is considered a paradigm for the whole class , it is imperative that sequencing of new mt genomes incorporate evidence from the transcriptome as often as possible, as annotations based on genomic data only are liable to errors. Fortunately, typical mammalian whole transcriptome RNA-Seq data contain a high proportion of reads matching the mt transcripts that remain largely unexplored in most gene expression studies, but which can be extremely useful in mt transcriptome characterization. This includes RNA-Seq data from published studies that are available in read archives. As Smith  points out, ‘the bulk of the organelle-derived RNA-Seq reads in public databases are waiting to be analysed’. Therefore, for some species (such as the mouse), it should be possible to rectify the situation even with the data that are now available.
Samples and RNA extraction
RNA-Seq data were collected for two adult bank voles from England, a male from near Diptford in Devon (vole no. 1634; P. Kotlík specimen database) and a female from near Cirencester in Gloucestershire (vole no. 1815). The individuals were trapped and taken under the appropriate Natural England (general) licence for wildlife management, strictly following the guidelines therein. The bank vole is a widespread Eurasian rodent species of the family Cricetidae, which is taxonomically placed within the same superfamily Muroidea as the house mouse (which is in the family Muridae). Total RNA was extracted with RNeasy Mini Kit (Qiagen, Valencia, CA, USA) from spleen samples stored in RNAlater, followed by DNase treatment using TURBO DNA-free Kit (Ambion, Austin, TX, USA) and an additional clean-up step using the RNeasy Mini Kit.
Library preparation and sequencing were performed with standard Illumina (San Diego, CA, USA) protocols on the Illumina HiSeq 2000 as described previously [12, 22, 23]. Briefly, mRNA molecules with stretches of poly(A) residues at the 3′ end were separated from RNAs that lack a poly(A) tail. After the poly(A) enrichment and fragmentation, the RNA was size-selected to 250–400 bp, reverse transcribed into cDNA, end-repaired and PCR-enriched. The resulting libraries were sequenced using the 100-bp paired-end module.
Mapping reads to the genome
The Illumina reads obtained for each vole were mapped to the annotated reference sequence of the bank vole mt genome (GenBank:KF918859) . Read mapping was performed using the CLC Genomics Workbench, version 6.0.1 (CLC bio A/S, Aarhus, Denmark), with a minimum length fraction of 0.9 and minimum similarity fraction of 0.95. This means that only those reads were considered where at least 90 % of the read matched the reference sequence with at least 95 % identity. Such a setting ensured the specificity of the mapping, but allowed for the reads with poly(A) tails to be mapped with unaligned ends. Reads that mapped with unaligned three or more non-template adenine nucleotides at the 3′ end were considered indicative of poly(A) sites, a criterion similar to that adopted by previous studies [2, 14]. To ascertain the location of the ends of the tRNA genes, the cloverleaf secondary structure of each tRNA was predicted using MITOS . The secondary structure of the putative OL was predicted with the CLC Genomics Workbench.
The work was carried out with the support from the Czech Science Foundation (grant number P506-11-1872) and the institutional support (RVO 67985904).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Temperley RJ, Wydro M, Lightowlers RN, Chrzanowska-Lightowlers ZM. Human mitochondrial mRNAs-like members of all families, similar but different. Biochim Biophys Acta. 2010;1797:1081–5.PubMed CentralView ArticlePubMedGoogle Scholar
- Mercer TR, Neph S, Dinger ME, Crawford J, Smith MA, Shearwood AM, et al. The human mitochondrial transcriptome. Cell. 2011;146:645–58.PubMed CentralView ArticlePubMedGoogle Scholar
- Peralta S, Wang X, Moraes CT. Mitochondrial transcription: Lessons from mouse models. Biochim Biophys Acta. 2012;1819:961–9.Google Scholar
- Bibb MJ, Van Etten RA, Wright CT, Walberg MW, Clayton DA. Sequence and gene organization of mouse mitochondrial DNA. Cell. 1981;26:167–80.View ArticlePubMedGoogle Scholar
- Boore JL. Requirements and standards for organelle genome databases. OMICS. 2006;10:119–26.View ArticlePubMedGoogle Scholar
- Bernt M, Donath A, Juhling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69:313–9.View ArticlePubMedGoogle Scholar
- Montoya J, Gaines GL, Attardi G. The pattern of transcription of the human mitochondrial rRNA genes reveals two overlapping transcription units. Cell. 1983;34:151–9.View ArticlePubMedGoogle Scholar
- Martin M, Cho J, Cesare AJ, Griffith JD, Attardi G. Termination factor-mediated DNA loop between termination and initiation sites drives mitochondrial rRNA synthesis. Cell. 2005;123:1227–40.View ArticlePubMedGoogle Scholar
- Ojala D, Montoya J, Attardi G. tRNA punctuation model of RNA processing in human mitochondria. Nature. 1981;290:470–4.View ArticlePubMedGoogle Scholar
- Rossmanith W, Tullo A, Potuschak T, Karwan R, Sbisa E. Human mitochondrial tRNA processing. J Biol Chem. 1995;270:12885–91.View ArticlePubMedGoogle Scholar
- Bendová K, Marková S, Searle JB, Kotlík P. The complete mitochondrial genome of the bank vole Clethrionomys glareolus (Rodentia: Arvicolinae). Mitochondrial DNA. 2014:doi:10.3109/19401736.2013.873927.
- Filipi K, Marková S, Searle JB, Kotlík P. Mitogenomic phylogenetics of the bank vole Clethrionomys glareolus, a model system for studying end-glacial colonization of Europe. Mol Phylogenet Evol. 2015;82:245–57.View ArticlePubMedGoogle Scholar
- Torres TT, Dolezal M, Schlotterer C, Ottenwalder B. Expression profiling of Drosophila mitochondrial genes via deep mRNA sequencing. Nucleic Acids Res. 2009;37:7509–18.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang HL, Yang J, Boykin LM, Zhao QY, Li Q, Wang XW, et al. The characteristics and expression profiles of the mitochondrial genome for the Mediterranean species of the Bemisia tabaci complex. BMC Genomics. 2013;14:401.PubMed CentralView ArticlePubMedGoogle Scholar
- Waern K, Snyder M. Extensive transcript diversity and novel upstream open reading frame regulation in yeast. Genes Genomes Genetics. 2013;3:343–52.PubMed CentralPubMedGoogle Scholar
- Taanman JW. The mitochondrial genome: structure, transcription, translation and replication. Biochim Biophys Acta. 1999;1410:103–23.View ArticlePubMedGoogle Scholar
- Nagaike T, Suzuki T, Ueda T. Polyadenylation in mammalian mitochondria: insights from recent studies. Biochim Biophys Acta. 2008;1779:266–9.View ArticlePubMedGoogle Scholar
- Slomovic S, Laufer D, Geiger D, Schuster G. Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol Cell Biol. 2005;25:6427–35.PubMed CentralView ArticlePubMedGoogle Scholar
- Gadaleta G, Pepe G, De Candia G, Quagliariello C, Sbisa E, Saccone C. The complete nucleotide sequence of the Rattus norvegicus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J Mol Evol. 1989;28:497–516.View ArticlePubMedGoogle Scholar
- Stewart JB, Beckenbach AT. Characterization of mature mitochondrial transcripts in Drosophila, and the implications for the tRNA punctuation model in arthropods. Gene. 2009;445:49–57.View ArticlePubMedGoogle Scholar
- Smith DR. RNA-Seq data: a goldmine for organelle research. Brief Funct Genomics. 2013;12:454–6.View ArticlePubMedGoogle Scholar
- Kotlík P, Marková S, Vojtek L, Stratil A, Šlechta V, Hyršl P, et al. Adaptive phylogeography: functional divergence between haemoglobins derived from different glacial refugia in the bank vole. Proc R Soc B Biol Sci. 2014;281:20140021Google Scholar
- Marková S, Searle JB, Kotlík P. Relaxed functional constraints on triplicate alpha-globin gene in the bank vole suggest a different evolutionary history from other rodents. Heredity. 2014;113:64–73.View ArticlePubMedGoogle Scholar
- Zuker M. The use of dynamic programming algorithms in RNA secondary structure prediction. In: Waterman MS, editor. Mathematical methods for DNA Sequences. Boca Raton, Florida: CRC Press; 1989. p. 159–84.Google Scholar