The complete mitochondrial genome of the bag-shelter moth Ochrogaster lunifer (Lepidoptera, Notodontidae)
© Salvato et al. 2008
Received: 21 May 2008
Accepted: 15 July 2008
Published: 15 July 2008
Skip to main content
© Salvato et al. 2008
Received: 21 May 2008
Accepted: 15 July 2008
Published: 15 July 2008
Knowledge of animal mitochondrial genomes is very important to understand their molecular evolution as well as for phylogenetic and population genetic studies. The Lepidoptera encompasses more than 160,000 described species and is one of the largest insect orders. To date only nine lepidopteran mitochondrial DNAs have been fully and two others partly sequenced. Furthermore the taxon sampling is very scant. Thus advance of lepidopteran mitogenomics deeply requires new genomes derived from a broad taxon sampling. In present work we describe the mitochondrial genome of the moth Ochrogaster lunifer.
The mitochondrial genome of O. lunifer is a circular molecule 15593 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. It contains also 7 intergenic spacers. The gene order of the newly sequenced genome is that typical for Lepidoptera and differs from the insect ancestral type for the placement of trnM. The 77.84% A+T content of its α strand is the lowest among known lepidopteran genomes. The mitochondrial genome of O. lunifer exhibits one of the most marked C-skew among available insect Pterygota genomes. The protein-coding genes have typical mitochondrial start codons except for cox1 that present an unusual CGA. The O. lunifer genome exhibits the less biased synonymous codon usage among lepidopterans. Comparative genomics analysis study identified atp6, cox1, cox2 as cox3, cob, nad1, nad2, nad4, and nad5 as potential markers for population genetics/phylogenetics studies. A peculiar feature of O. lunifer mitochondrial genome it that the intergenic spacers are mostly made by repetitive sequences.
The mitochondrial genome of O. lunifer is the first representative of superfamily Noctuoidea that account for about 40% of all described Lepidoptera. New genome shares many features with other known lepidopteran genomes. It differs however for its low A+T content and marked C-skew. Compared to other lepidopteran genomes it is less biased in synonymous codon usage. Comparative evolutionary analysis of lepidopteran mitochondrial genomes allowed the identification of previously neglected coding genes as potential phylogenetic markers. Presence of repetitive elements in intergenic spacers of O. lunifer genome supports the role of DNA slippage as possible mechanism to produce spacers during replication.
List of taxa analyzed in present paper
Hang and Zhang, unpublished
Baumann and Baumann, unpublished
Lepidopsocid sp. RS-2001
An et al., unpublished
Liu et al., unpublished
Lee et al., unpublished
Yang et al., unpublished
Hong et al., unpublished
Ho et al., unpublished
Lobo et al., unpublished
Anopheles quadrimaculatus A
Cibrario et al., unpublished
Azeredo-Espin et al., unpublished
Lessinger et al., unpublished
Ye et al., unpublished
Yu et al., unpublished
Ye et al., unpublished
Ye et al., unpublished
Apis mellifera ligustica
Silvestre and Arias, unpublished
The O. lunifer mtDNA has the typical lepidopteran gene order [8, 9] that differs from the ancestral gene order of insects  for the placement of trnM. In the ancestral type (e.g. Drosophila yakuba mtDNA) the order in the α strand is: A+T region, trnI, trnQ, trnM, nad2. In all lepidopteran mtDNAs, sequenced to date, the order is: A+T region, trnM, trnI, trnQ, nad2 which implies the translocation of trnM [5–11]. This placement of trnM is a molecular feature exclusive to lepidopteran mtDNAs. Further genome sequencing is necessary to establish if this feature is a mitochondrial signature of the whole order Lepidoptera.
The composition of the α strand of O. lunifer mtDNA is A = 6252 (40.09%), T = 5886 (37.75%), G = 1179 (7.56%) and C = 2276 (14.60%).
The average A+T% value for the analyzed mtDNAs set is 76. 63 ± 4.84. The highest A+T% values are shared by the mtDNAs of three bees (Apis mellifera, Bombus ignitus and Melipona bicolor) and two bugs (Aleurodicus dugesii and Schizaphis graminum). All lepidopteran mtDNAs but O. lunifer exhibit high A+T% values. The A+T content of O. lunifer mtDNA is 77.84% that represents the lowest value for lepidopteran complete mtDNAs [5–8, 10]. The lowest A+T contents are found in the termite mtDNAs (Reticulitermes spp.). Extreme A+T values are also shared by species having highly re-arranged gene order . However the possession of a re-arranged genome is not sufficient per se to have an A+T content drastically departing from the average (e.g. Aleurochiton aceris and Bemisia tabaci). The A+T values appear to be linked to taxonomic relatedness at low rank (i.e. genus, family) (e.g. species of Drosophila, species of Bactrocera, members of family Apidae). The relation is not true at higher ranks (i.e. superfamily; order) where patterns become inconsistent and the A+T content can be very different among species as exemplified by Hemiptera (A. dugesii vs.Triatoma dimidiata).
The average A-skew is 0.04214 ± 0.11350 and most of pterygote mtDNAs are slightly to moderately A-skewed with values ranging from 0.00287 (B. ignitus) to 0.18247 (Locusta migratoria). The lepidopteran A-skews vary from -0.04748 (C. raphaelis) to 0.05872 (B. mori) with the O. lunifer mtDNA exhibiting a slight A-skew (0.03015). The Reticulitermes mtDNA genomes, having the lowest A+T% content, exhibit a very pronounced A-skew. Most marked T-skews are observed in the mtDNA genomes of Campanulotes bidentatus and Trialeuroides vaporarium that have low A+T% content and gene-orders different than insect ancestral gene order [1, 14, 15]. Gene order re-arrangement is not necessarily linked to strong A/T-skew as proved by the highly rearranged, but low skew, genome of Heterodoxus macropus .
The average G+C% content is 23.37 ± 4.84. The G+C% pattern among various species is obviously opposite to the A+T% thus it does not require further comments. More composite is the G/C-skew distribution. The average G-skew is -0.16006 ± 0.138235. Most of pterygota mtDNAs are C-skewed with G-skew values ranging from -0.32827 (Vanhornia eucnemidarum) to -0.01250 (Heterodoxus macropus). The main exception is represented by the mtDNA of bugs, while the highest G-skewed genome is that of C. bidentatus. Most of lepidopteran mtDNAs share very similar G-skew values that are included within the bulk of mtDNAs. The notable exception is represented by the newly determined mtDNA of O. lunifer that exhibits the second most pronounced C-skew (G-skew = -0.31751) among analyzed genomes.
G-skew can be markedly different even in species belonging to the same genus and having a very similar G+C content as well exemplified by Reticulitermes santonensis and Reticulitermes virginicus mtDNAs. The same reasoning applies at high taxonomic rank to the Hemiptera. The mtDNA of C. bidentatus exhibits very high A-skew and G-skew. However, this feature is not a general rule and extreme A-skew and G-skew are not necessarily reciprocally linked, as proved by species of genus Reticulitermes that exhibit very strong A-skews but not G-skews.
The list of currently available mtDNAs reveals that there is a strong bias in term of taxon sampling both at low and high taxonomic ranks within Pterygota. A direct consequence is that present knowledge of base composition and A/G skews reflects such biases and addition of a single taxon can change our view on these features. This point is well exemplified by the O. lunifer mtDNA that exhibits a A+T percentage different than other lepidopteran mtDNAs that share high A+T contents [8, 9]. Thus a broad and more balanced taxon sampling appears to be a mandatory goal to investigate and identify general patterns for the parameters considered above.
The mtDNA of O. lunifer contains the full set of PCGs usually present in animal mtDNA. PCGs are arranged along the genome according to the standard order of Insects  (Figure 1). The putative start codons of PCGs are those previously known for animal mtDNA i.e. ATN, GTG, TTG, GTT  with the only exception represented by the CGA start codon of cox1 gene. This non-canonical putative start codon is found also in the butterfly A. melete and in the moths A. honmai, B. mori, B. mandarina, M. sexta and P. atrilineata [5–8]. In the butterfly C. raphaelis the tetranucleotide TTAG is the putative start codon  and the six nucleotide TATTAG has been suggested as putative start codon for the moths O. nubilalis and O. furnicalis . An unusual start codon for cox1 gene is known in various arthropod mtDNA [e.g. ].
The cox1, cox2, nad5, and nad4 genes of O. lunifer mtDNA have incomplete stop codons. The presence of incomplete stop codons is a feature shared with all lepidopteran mtDNAs sequenced to date [5–10] and more in general with many arthropod mtDNAs .
The atp8 and a atp6 of O. lunifer are the only PCGs having a seven nucleotides overlap (Figure 1). This feature is common to all lepidopteran mtDNA genomes known [5–10] and is found in many animal mtDNAs .
Total number of non-stop codons (CDs) used by the 12 analyzed mtDNAs is very similar ranging from 3695 of C. raphaelis to 3732 of O. lunifer. The codon families exhibit a very similar behavior among considered species. The eight codon families with at least 50 CDs per thousand CDs (Leu2, Ile, Phe, Met, Asn, Ser2, Gly, Tyr) encompass an average 65.82% ± 1.20% of all CDs. The three families with at least 100 CDs per thousand CDs (Leu2, Ile, Phe) account for an average 35.36% ± 0.98% of all CDs (Figure 3). The A+T rich CDs are favored over synonymous CDs with lower A+T content as proved by RSCU results (Figure 4). This point is well exemplified by the Leu2 family where the TTA codon accounts for the large majority of CDs in the family (see below). Invertebrate mitochondrial code includes 62 amino-acid encoding codons . Among the 12 analyzed genomes the total number of used codons results to be directly linked to the A+T content. The C. raphaelis mtDNA, having the highest A+T% content (see Figure 2) uses 52 codons, and never utilized the 10 G+C rich codons listed in Figure 2. Conversely, O. lunifer mtDNA, characterized by the lowest A+T% among considered lepidopteran genomes, uses all 62 codons. Differences in the number of used CDs are present between species of the same genus (e.g. B. mandarina vs. B. mori) even if the discrepancies appear circumscribed to G+C rich CDs with very limited use (e.g. GCG and CGC). The Leu1 (average = 11.73 ± 3. 82%) and Leu2 (average = 88.44 ± 3.89%) codon families are very differently represented in lepidopteran PCGs while Ser1 (average = 34.95 ± 3.67%) and Ser2 (average = 64.05 ± 1.09%) exhibit a more balanced composition.
Four amino acid residues (Leu, Ile, Phe and Ser) account for more than 44.50% (average = 45.68 ± 0.58%) of all residues forming the 13 mitochondrial proteins. The Leu and Ile amino acids share hydrophobic lateral chains, Phe is also hydrophobic and Ser exhibits an aliphatic behavior  thus their massive presence is striking but not surprising for membrane proteins.
The trnA, trnD, trnG, trnK, trnL1, trnL2, trnQ, and trnS2 of Ochrogaster mtDNA show mismatches in their stems. Mismatches are located mostly in the acceptor and anticodon stems with a single exception represented by trnD that exhibits the mismatch on the TΨC stem. Mismatches on tRNA stems are known also for the trnA, trnL1, trnL2, and trnQ, of C. raphaelis . Mismatches observed in tRNAs are corrected through RNA-editing mechanisms that are well known for arthropod mtDNA [e.g. ].
Preliminary analysis performed on rrnL and rrnS of O. lunifer revealed that these genes are capable of folding into structures (data not shown) similar to those already produced for lepidopteran mitochondrial ribosomal subunits [8, 25, 26]. Further studies, that extend the taxon sampling, are currently in progress in our lab to better define rrnL and rrnS structures within the Thaumetopoeinae subfamily that includes also O. lunifer.
The s1 spacer, located between trnQ and nad2, appears to be the result of a duplicated segment (Figure 7). The s1 spacer is present in all 12 lepidopteran mtDNAs so far sequenced while it is absent in other insects . While the genomic location is constant the sequence divergence is high among species . Further investigation with a broad taxon sampling within the Lepidoptera is necessary to assess if the s1 spacer is a constant molecular signature of lepidopteran mtDNA.
The s2 spacer, placed between trnC and trnY, derives from the triplication of a six nucleotides motif with minor changes (Figure 7). An 11 bp spacer between trnC and trnY is found also in the mtDNA of A. melete and shares the ACAATT motif with the s2 spacer of O. lunifer. Because no other known lepidopteran mtDNA exhibits such a spacer its presence in A. melete and O. lunifer has to be interpreted as the result of independent events.
Spacer s3, located between nad3 and trnA, exhibits a partial duplicated segment and a poly-T motif within the first 30 nt. The second half of s3 spacer is characterized by two microsatellite repeats (CA)10(TA)12. Spacers having the same genomic location, and containing TA microsatellites are found also in B. mori and B. mandarina mtDNA genomes.
Spacer s4, inserted between trnE and trnF, contains a 5' microsatellite (TA)23, while the 3' half seems to be the triplication of a 10 nucleotides motif with some changes (Figure 7). A spacer characterized by a different motif (TATTA)31, but having the same genomic placement, is found in the A. honmai mtDNA genome.
The spacer s5, located between trnS2 and nad1, contains the ATACTAA motif which is conserved across the Lepidoptera order . This motif is possibly fundamental to site recognition by the transcription termination peptide (mtTERM protein) . Spacer s5 is present in most insect mtDNAs even if the nucleotide sequence can be quite divergent .
The s6 spacer is located between trnS2 and -rrnL and exhibits a di-nucleotide microsatellite (TA)19 directly in contact with the 3' end of rrnL gene. To date spacer s6 is known only for the mtDNA of O. lunifer.
The s7 spacer coincides with the A+T region. Several features common to the Lepidoptera A+T region  are present in the s7 spacer. The ORβ (origin of the β strand replication) is located 21 bp downstream from rrnS gene in B. mori . It contains the motif ATAGA followed by an 18 bp poly-T stretch. A very similar pattern occurs in O. lunifer where the ATAGA motif is located 17 bp downstream from rnnS gene and is followed by a 20 bp poly-T stretch (Figure 7). A microsatellite-like (AT)7(TA)3 element preceded by the ATTTA motif is present in the 3' third of O. lunifer s7 spacer. The presence of a microsatellite preceded by the ATTA motif is also a feature found in the A+T regions of other Lepidoptera . Finally a 10 bp poly-A is present immediately upstream trnM. This poly-T (in the β strand) element is still a common feature of the A+T region in Lepidoptera [8, 28]. No large repeated segments were detected in the A+T region of O. lunifer. This arrangement is consistent with other lepidopteran A+T regions while markedly contrasts with patterns observed in other insect orders [8, 29].
Intergenic spacers containing repeated elements are scattered all over the lepidopteran mtDNAs while repeated elements are restricted mostly to the A+T region in other insects . Most parts of spacers of O. lunifer are made by repeated motifs. Predominance of repeated elements suggest that mtDNA expansion can be achieved through a miss-pairing duplication mechanism, i.e. DNA slippage, during genome replication. Several intergenic spacers are restricted to a single butterfly/moth species and have not counterparts even within Lepidoptera. Thus it is plausible to suggest that spacers production occurs independently and recursively within Lepidoptera. It remains unknown while this feature is so prominent in moths and butterflies and apparently limited, reduced or absent in other insect mtDNAs sequenced to date. This behavior requires further investigation provided that mtDNA intergenic spacers are found in non-insect Arthropoda as well as other animal phyla [e.g. [18, 30]].
The mitochondrial genome of O. lunifer is the first sequenced mtDNA for a representative of the Noctuoidea a superfamily that includes about 40% of all described lepidopteran species. The newly determined genome shares the gene order, the presence of intergenic spacers, and other features with previously known lepidopteran genomes. The placement of trnM immediately after the A+T region results to be an exclusive molecular signature of all lepidopteran mtDNAs sequenced to date. Further genome sequencing will establish if this feature characterizes the whole order Lepidoptera. The mtDNA of O. lunifer exhibits a peculiar low A+T content and marked C-skew. Compared to other lepidopteran genomes it is less biased in synonymous codon usage. Comparative analysis on codon usage among lepidopteran mitochondrial genomes identified atp6, cox1, cox2, cox3, cob, nad1, nad2, nad4, and nad5 as potential markers for phylogenetic and population genetic studies. Most of the genes listed above have been previously neglected for the tasks suggested here. The massive presence of repetitive elements in intergenic spacers of O. lunifer genome lead us to suggest an important role of DNA slippage as possible mechanism to produce spacers during replication.
An ethanol-preserved larva specimen of Ochrogaster lunifer collected in Australia (Suburb of Kenmore, Queensland, 25th February 2005) by Myron P. Zalucki (University of Queensland) was used as starting material for this study. Total DNA was extracted by applying a salting-out protocol . Quality of DNA was assessed through electrophoresis in a 1% agarose gel and staining with ethidium bromide.
PCR amplification was performed using a mix of insect universal primers [32, 33] and primers specifically designed on the O. lunifer sequences. For a full list of successful primers as well as PCR conditions see Additional file 1. The PCR products were visualized in electrophoresis in a 1% agarose gel and staining with ethidium bromide. Each PCR product represented by a single electrophoretic band was purified with the ExoSAP-IT kit (Amersham Biosciences) and directly sequenced. Sequencing of both strands was performed at the BMR Genomics service (Padova, Italy) on automated DNA sequencers mostly employing the primers used for PCR amplification.
The mtDNA final consensus sequence was assembled using the SeqMan II program from the Lasergene software package (DNAStar, Madison, WI). Genes and strands nomenclature used in this paper follows Negrisolo et al. .
Sequence analysis was performed as follows. Initially the mtDNA sequence was translated into putative proteins using the Transeq program available at the EBI web site. The true identity of these polypeptides was established using the BLAST program [34, 35] available at the NCBI web site. Gene boundaries were determined as follows. The 5' ends of PEGs were inferred to be at the first legitimate in-frame start codon (ATN, GTG, TTG, GTT; ) in the open reading frame (ORF) that was not located within the upstream gene encoded on the same strand. The only exception was atp6, which has been previously demonstrated to overlap with its upstream gene atp8 in many mtDNAs . The PCG terminus was inferred to be at the first in-frame stop codon encountered. When the stop codon was located within the sequence of a downstream gene encoded on the same strand, a truncated stop codon (T or TA) adjacent to the beginning of the downstream gene was designated as the termination codon. This codon was thought to be completed by polyadenylation to a complete TAA stop codon after transcript processing. Finally pair-wise comparisons with orthologous proteins were performed with ClustalW program  to better define the limits of PCGs.
The transfer RNA genes were identified using the tRNAscan-SE program  or recognized manually as sequences having the appropriate anticodon and capable of folding into the typical cloverleaf secondary structure .
The boundaries of the ribosomal rrnL gene were assumed to be delimited by the ends of the trnV-s6 pair. The 3' end of rrnS gene was assumed to be delimited by the start of trnV while the 5'end was determined through comparison with orthologous genes of other Lepidoptera so far sequenced.
Nucleotide composition was calculated with the EditSeq program included in the Lasergene software package. The GC-skew = (G-C)/(G+C) and AT-skew = (A-T)/(A+T) were used  to measure the base compositional difference between the different strands or between genes coded on the alternative strands. The Relative Synonymous Codon Usage (RSCU) values were calculated with MEGA 4 program .
The codon usage by analyzed genomes was investigated by calculating the two indices ENC (Effective Number of Codon used)  and MILC (Measure Independent of Length and Composition . ENC and MILC values were calculated with the INCA 2.1 program .
ATP synthase subunits 6 and 8
cytochrome c oxidase subunits 1–3
NADH dehydrogenase subunits 1–6 and 4L
small and large subunit ribosomal RNA (rRNA) genes
transfer RNA (tRNA) genes, where X is the one-letter abbreviation of the corresponding amino acid
mitochondrial genomic spacers
the putative control region
protein coding gene
Relative Synonymous Codon Usage
Measure Independent of Length and Composition
We express our sincere thanks to Myron P. Zalucki (School of Integrative Biology, University of Queensland, Brisbane, Australia) who kindly provided the specimen of Ochrogaster lunifer used in present study. We thank Filippo Calore (Albignasego, Padova, Italy) who painted the icon of O. lunifer included in Figure 1, using as template a picture publically available at the CSIRO web site. Finally we thank two anonymous referees that provided very useful suggestions that helped to improve the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.