Comparative and phylogenomic studies on the mitochondrial genomes of Pentatomomorpha (Insecta: Hemiptera: Heteroptera)
© Hua et al. 2008
Received: 17 July 2008
Accepted: 17 December 2008
Published: 17 December 2008
Skip to main content
© Hua et al. 2008
Received: 17 July 2008
Accepted: 17 December 2008
Published: 17 December 2008
Nucleotide sequences and the gene arrangements of mitochondrial genomes are effective tools for resolving phylogenetic problems. Hemipteroid insects are known to possess highly reorganized mitochondrial genomes, but in the suborder Heteroptera (Insecta: Hemiptera), there was only one complete mitochondrial genome sequenced without gene rearrangement and the phylogeny of infraorder Pentatomomorpha in Heteroptera was still uncertain.
Fifteen mitochondrial genomes of the suborder Heteroptera were sequenced. Gene rearrangements were found as follows: 1) tRNA-I and tRNA-Q switched positions in Aradidae, 2) tRNA-T and tRNA-P switched positions in Largidae and Pyrrhocoridae. Two recombination events were found in Alydidae and Malcidae. The other mt-genomes were organized in the same way as observed in Drosophila yakuba. The phylogenetic analyses of infraorder Pentatomomorpha based on the nucleotide sequence raised the hypothesis of (Aradoidea + (Pentatomoidea + (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)))). The rearrangement of tRNA-T and tRNA-P also linked Largidae and Pyrrhocoridae together. Furthermore, the conserved sequence block in the unusual intergenic spacers between tRNA-H and ND4 favored the monophyly of Lygaeoidea. Tetranucleotide ATCA was inferred to be the initiation codon of ND2 in Cydnidae. No correlation was found between the rates of nucleotide substitution and gene rearrangement. CG content was significantly correlated with the nucleotide substitution rate of each gene. For ND1, there was a positive correlation (P < 0.01) between amino acids variations and hydrophobicity, but a negative correlation (P < 0.01) for ND6. No conserved sequence was found among the control regions and these regions were not always the most AT-rich region of the mt-genome.
Heteropteran insects are extremely complex groups worthy of further study because of the unusual tetranucleotide initiation codon and their great mt-genomic diversity, including gene rearrangements and recombinations. The mt-genome is a powerful molecular marker for resolving phylogeny at the level of the superfamily and family. Gene rearrangements were not correlated with nucleotide substitution rates. CG content variation caused the different evolutionary patterns among genes. For ND1, in many polar or nonpolar regions the specific identity of the amino acid residues might be more important than maintaining the polarity of these regions, while the opposite is true for ND6. Most sequences of the control regions did not appear to be important for regulatory functions. Finally, we suggest that the Term "AT-rich regions" should not be used.
To date, more than one hundred complete or nearly complete insect mitochondrial genomes (mt-genomes) and thousands of partial sequences are available in GenBank/EMBL/DDBJ, and the data set is expanding at an increasing pace. Most animal mt-genomes contain the same 37 genes, but the orders of the genes are variable among and within the major groups . The mt-genome is becoming an effective tool to resolve phylogenetic problems due to its unique properties, including maternal inheritance, orthologous genes, and lack of the substantial inTermolecular recombination . Patterns of mitochondrial gene rearrangements are also indicators of phylogenetic relationships [3–5]
In Insecta, mitochondrial gene rearrangements have been found in 11 orders (including some unpublished data) with the others exhibiting the same gene order observed in Drosophila yakuba [6, 7]. It has been found that the hemipteroid insects (including four orders, Hemiptera, Thysanoptera, Psocoptera, and Phthiraptera) have experienced increased rates of gene rearrangements in the mt-genomes [8–12]. In the order Phthiraptera, the three sequenced mt-genomes [GenBank: NC002651, GenBank: NC009983, GenBank: NC007884] show different gene arrangements. Within the suborder Sternorrhyncha of order Hemiptera, six of the eight sequenced mt-genomes [GenBank: NC005939, GenBank: NC006159, GenBank: NC006160, GenBank: NC006292, GenBank: NC006279, GenBank: NC006280] experienced gene rearrangements and are different from each other. But within the suborder Heteroptera of order Hemiptera, the gene order of the only sequenced mt-genome, Triatoma dimidiata (Latreille, 1811) (Reduviidae) [GenBank: NC002609], is identical to that of Drosophila yakuba [GenBank: NC001322].
It has been proposed that such gene rearrangements as synapomorphies would not contribute much to resolving insect interordinal relationships, while using gene order to understand intraordinal relationships would be useful . We undertook to examine whether the gene arrangements are conservative in suborder Heteroptera, especially in the infraorder Pentatomomorpha, to elucidate the phylogenetic relationships within this group and deTermine whether the gene order could resolve the phylogenetic relationships at the superfamily level.
Pentatomomorpha is one of six infraorders of Heteroptera . It consists chiefly of terrestrial phytophagous bugs. Many of these bugs feed on seeds and plant vascular systems and some feed on fungi. This group of insects is economically important in agriculture and forestry, making robust phylogenetic hypotheses for the main lineages of significant interest. Views on the systematics of Pentatomomorpha before 1993 were summarized by Schaefer (1993) . Currently the classification system of Pentatomomorpha includes five superfamilies, Aradoidea, Pentatomoidea, Coreoidea, Lygaeoidea and Pyrrhocoroidea (the superfamilies except the Aradoidea are grouped as Trichophora) . There are different views of the phylogenetic relationships within Pentatomomorpha based on morphological or molecular evidence. Using morphological evidence, Schaefer (1993) argued that Aradoidea may be either the sister group of Trichophora or Cimicomorpha-cum-Pentatomomorpha . Sweet (1996) proposed that the Aradoidea constituted a separate heteropteran infraorder, Aradomorpha . Henry (1997) suggested the relationship (Aradoidea + (Pentatomoidea + ((Coreoidea + Pyrrhocoroidea) + (Idiostoloidea + Lygaeoidea)))) . Using molecular evidence, Xie et al (2005) performed Bayesian analysis with the 18S rDNA data set on the main lineages of Trichophora, giving the hypothesis of (Pentatomoidea + (Pyrrhocoroidea + (Coreoidea + Lygaeoidea))) and raised the taxon Eutrichophora (including the superfamilies of Trichophora except Pentatomoidea) and Neotrichophora (including Coreoidea and Lygaeoidea), but the relationships within the Neotrichophora were not completely resolved . Li et al. (2005) proposed the hypothesis that (Aradoidea + (Pentatomoidea + ("Pyrrhocoroidea"+ "Coreoidea" + "Lygaeoidea"))) based on the partial sequence of the CO1 and 18S rDNA . Li et al. (2005), however, found that the superfamilies Pyrrhocoroidea, Coreoidea, and Lygaeoidea were not monophyletic. However, Li et al. (2005) included several low quality sequences with many ambiguous nucleotides and a lack of subregions, and two CO1-like genes were used instead of the CO1 gene; also, many nodes in the cladogram appeared with bootstrap values lower than 50%. Additionally, Wheeler et al (1993) supported the monophyly of Pentatomomorpha and a sister group relationship between Aradoidea and Trichophora based on evidence combining morphological characters and partial 18S rDNA .
To date, the hypothesis (Aradoidea + (Pentatomoidea + the remainder of Trichophora)) has been accepted by most researchers, based on morphological and molecular evidence [17–19, 21]. Within Pentatomomorpha, especially the Eutrichophora, the phylogenetic relationships among the superfamilies are still controversial.
For a robust phylogenetic hypothesis of the higher-level relationships in any organism, multiple genes and large sequence lengths are favored [22–24]. However, the DNA sequences that have been used for phylogenetic analyses of Pentatomomorpha are no longer than 1.6 kb. Thus we hoped to resolve the phylogeny of Pentatomomorpha with the mt-genomes, which are the largest set of homologous genes that can be conveniently compared across animal taxa . No bioinformatic studies on the mt-genomes of Pentatomomorpha have been done.
Recombination has been considered to be absent in the animal mt-genome , but recombination has been observed in insect mtDNA . There are different views on whether the nucleotide substitution rate correlates with gene rearrangements [27–29].
We sequenced the mt-genomes of 15 heteropterans in this study with five main aims: 1) to explore the organization of their mt-genomes and infer possible evolutionary mechanisms; 2) to find out whether there is recombination in heteropteran mt-genomes; 3) to investigate the phylogenetic relationships among the superfamilies of Pentatomomorpha; 4) to identify the correlation between the nucleotide substitution rate and gene rearrangement; 5) to describe the bioinformatics of heteropteran mt-genomes.
General informatics of the mt-genomes in this study.
Total A+T %
AT% of control region
Saldula arsenjevi Vinokurov
Triatoma dimidiata (Latreille)
Orius niger Wolff
tRNA(I) - 12S
Neuroctenus parus Hsiao
Nezara viridula (Linnaeus)
Macroscytus subaeneus (Dallas)
tRNA(I) - 12S
Coptosoma bifaria Montandon
Yemmalysus parallelus Stusak
Phaenacantha marcida Horvath
tRNA(I) - 12S
Malcus inconspicuus Stys
Geocoris pallidipennis (Costa)
tRNA(I) - 12S
Physopelta gutta (Burmeister)
Dysdercus cingulatus (Fabricius)
Riptortus pedestris Fabricius
Hydaropsis longirostris (Hsiao)
Aeschyntelus notatus Hsiao
tRNA(I) - 12S
Within these genomes, all of the 37 genes commonly found in metazoan mt-genomes were deTermined: 2 for rRNAs, 22 for tRNAs, and 13 for proteins. Most of these genes are arranged in the same order as the putative ancestral insect mt-genome , with a few exceptions discussed later. We also found both intergenic spacers and genes that overlapped by several nucleotides. For discussion purposes, the strand containing the CO1 gene was designated as the major-strand (J-strand) because it encoded 23 genes. Consequently the other strand encoding the remaining 14 genes was designated as the minor-strand (N-strand).
In type B, Aradidae, the tRNA-Q translocated from the inferred ancestral position (tRNA-I-and-tRNA-Q-and-tRNA-M) to upstream of the tRNA-I with a 44 nt spacer which could be folded as a stem-loop structure between them.
In most of the sequenced mt-genomes of heteropterans, tRNA-T and tRNA-P form a concatenated unit and locate upstream to ND6 gene. But in Type C, Pyrrhocoroidea (including Largidae and Pyrrhocoridae), tRNA-T and tRNA-P switched their positions with two extra intergenic spacers in different locations. In Largidae, the spacers are 19 nt between ND4L and tRNA-P, and 240 nt between tRNA-T and ND6. In Pyrrhocoridae, the spacers are 33 nt between ND4L and tRNA-P, and 60 nt between tRNA-P and tRNA-T.
Furthermore, the intergenic spacer between tRNA-S (TGA) and ND1 occurs in all the mt-genomes except Pyrrhocoridae. All these spacers are no longer than 30 nt except Reduviidae which is longer than 300 nt. Some of these spacers may be folded as stem-loop structures, and in Reduviidae this spacer contains an unidentified open reading frame of 312 nt . Among these spacers, there is a CSB (Figure 2B), which might indicate that the extant heteropterans underwent a common inTermediate stage of the TDRL process, i.e., this spacer might be a synapomorphy of heteropterans.
In Malcidae, one subregion of the intergenic spacer between the tRNA-H and the ND4 has an exactly repeated counterpart in the control region (34 nt, Blast E-value: 2e-15) (see Additional file 1), and this subregion is also part of the CSB in Lygaeoidea. Considering that this intergenic spacer is shared among the members of Lygaeoidea, in Malcidae the copy of the repetition located between tRNA-H and ND4 is supposed to be the original one. This phenomenon could also be elucidated by the TDRL model wherein a large fragment from the intergenic spacer to the control region duplicated and then most of them were lost except the repetition. However, across the whole mt-genome, no vestige of such a large duplication was found. The possible alternative mechanism is recombination. It has been reported that the secondary structures formed by the tRNA genes or the self complementary sequences may play a major role as the hotspots in recombination [36, 37], and the control region is supposed to be a possible recombination hotspot . Interestingly, this intergenic spacer in Malcidae is capable of forming a stem-loop structure (see Additional file 1) adjacent to tRNA-H, and its repetition in the stem-loop-rich control region is adjacent to tRNA-I. So, we speculated that this duplication might be the product of illegitimate recombination via a mini-circle (Figure 4B). However, in Berytidae, no copy of this spacer was found across the mt-genomes. Further study is needed to investigate whether this repetition is the autapomorphy of Malcidae, since the control regions of Geocoridae and Colobathristidae could not be sequenced because of poly A and poly T.
The transfer RNA gene recruitment or the duplication and the remolding make it hard to identify the homologs for these short genes and could also affect the use of mitochondrial gene orders for the phylogenetic reconstructions [38, 39]. It has also been proposed that the tRNAs are much less homoplastic than the PCGs and should be included in the phylogenetic analyses . Thus, it is necessary to understand whether the tRNA genes are homologs before the phylogenetic analyses.
For the 22 tRNAs, each type holds the same anticodon respectively across the mt-genomes in this study. We aligned each type of tRNA and adjusted them according to their secondary structures (the alignments are the same as those used in phylogenetic analyses), and then the pairwise similarities within each type of tRNA were calculated by pairwise alignment using BioEdit (see Additional file 2). High similarity values could found within each type of the tRNAs, indicating that no tRNA recruitment or remolding [38, 39] happened and the rearrangements of the tRNAs in Aradidae, Largidae, and Pyrrhocoridae should be the transpositions of the tRNA coding sequences. Finally, we deTermined that each group of tRNAs used in the phylogenetic analyses are homologous.
Our results also confirm that Aradoidea and the Trichophora are sister groups as indicated in previous studies [17–19, 21]. Hence, we argue that there is no necessity to raise Aradoidea from a superfamily to an infraorder, because the seven-infraorder classification of the Heteroptera has been accepted by most researchers [15, 20]. The sister groups' relationship of Pentatomoidea and the remainder of the Trichophora raised in this study also conforms to the morphological and nuclear rDNA data [17, 18]. In Eutrichophora, our study raises a phylogenetic hypothesis of (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)). This result is different from Henry (1997)  but similar to Xie et al (2005) . Since this study did not include all the families of each superfamily, family relationships within each superfamily, remain incomplete. The Rhophalidae presents a sister position to the (Coreidae+Alydidae) and the monophyly of the (Coreidae+Alydidae) is also accepted by Henry (1997)  and Xie et al (2005) . In Lygaeoidea, our conclusion is (Colobathristidae + (Berytidae + (Geocoridae + Malcidae))), which is different from Henry (1997) , whose hypothesis could be summarized as (Geocoridae + (Malcidae + (Colobathristidae + Berytidae))). The monophyletic malcid line [17, 40] is broken down in our study because of the interposition of Geocoridae.
Surprisingly, the infraorder Cimicomorpha (including Reduviidae, Anthocoridae, and other families not included in this study), whose monophyly has been accepted [20, 41], was not monophyletic in this study. The Bayesian posterior probability value for the node (Anthocoridae + Pentatomomorpha) is 1.00. The mt-genome data in this study may be too limited to resolve the phylogeny of Cimicomorpha, and more mt-genomes of the additional taxa of Cimicomorpha need to be sequenced because increased taxon sampling could greatly reduce the phylogenetic error , or possibly the monophyly of Cimicomorpha needs to be revised.
Additionally, in Lygaeoidea the CSB in the intergenic spacers between tRNA-H and ND4 indicates that the members of Lygaeoidea have undergone a common inTermediate TDRL and this spacer is the synapomorphy of Lygaeoidea favoring its monophyly. Furthermore, the position switch of tRNA-T and tRNA-P also links Largidae and Pyrrhocoridae together.
Relative nucleotide substitution rate test among the lineages of Pentatomomorpha.
Nucleotide composition of the tRNA-T and the tRNA-P.
In general, the groups whose genes rearranged do not always exhibit significantly different substitution rates. So, there is no correlation between the rates of nucleotide substitution and gene rearrangement.
The nucleotide substitution rates averaged over all sequence pairs of each gene vary among genes (see Additional file 3A). The overall mean value of this ratio of each pair of PCGs is 0.639, while for the rRNAs it is 0.384. Furthermore, the concatenated 22 tRNAs present a lower ratio (0.241) than PCGs and rRNAs, as reported before . This is a reasonable result because the purifying selection constrains nucleotide divergence in the rRNAs and the tRNAs due to the secondary structures of these molecules and the specificity of their amino acid binding sites during the translation . ND6 presents the highest nucleotide substitution rate, while CO1 appears to be the lowest. Surprisingly, CO2, which is considered as one of the slowest evolving genes , presents a higher nucleotide substitution ratio than most of the PCGs. As shown in Additional file 3A, the number of synonymous substitutions per synonymous site of CO2 is the highest, but its number of nonsynonymous substitutions per nonsynonymous site is much lower, while this value for ATP8 is the highest. As far as the PCGs are concerned, at the amino acid level the cytochrome oxidase subunits (CO1, CO2, and CO3) and cytochrome b are the slowest evolving proteins and ATP8 is the fastest. The nucleotide substitution rates of the first and second codon positions of PCGs are always lower but more variable within each gene than the third codon positions (see Additional file 3B), which is most likely because of the different selective constraints on the different genes . Obviously, for PCGs the ratios of the nonsynonymous nucleotide changes (Ka) versus the synonymous nucleotide changes (Ks) are all below 1, indicating that they are evolving according to the purifying selection .
Negative correlations have been found between the nucleotide substitution rate and the CG content of each gene (P < 0.01), and between the Ka/Ks and the CG content of each PCG (P < 0.01), which indicate that the variation of CG content causes the different evolutionary patterns among genes. For ND1, there is positive correlation (P < 0.01) between the variations and the hydrophobicity of the amino acids, but negative correlation (P < 0.01) for ND6, indicating that, in many polar or nonpolar regions, the specific identify of amino acid residues may be more important than maintaining the polarity of these regions of ND1, while the opposite is true for ND6, as reported in other organisms .
Some PCGs do not possess complete stop codons but appear to be Terminated with a single T adjacent to a downstream tRNA gene on the same strand (data not shown, see the GenBank files under the Accession Numbers in Table 1). This is not peculiar the in mt-genome and a TAA stop codon is probably generated by posttranscriptional polyadenylation . RNase P has been found to take on some aspects of the structural conformation of the tRNAs for the recognition signals of the cleavage sites in the tRNA precursors, rather than the nucleotide sequences . Thus, we supposed that the site right before tRNA should be the cleavage site. However, the complete common stop codons always overlap several nucleotides within the downstream tRNA, which perhaps act as a "backup" to prevent the translational read through if the transcripts are not properly cleaved .
Nucleotide composition of the mt-genomes in this study
Length of total genes (bp)
Length of control region (bp)
PCGs of J-strand
PCGs of N-strand
Because there are only 22 tRNA genes in one mt-genome, one tRNA must recognize more than one type of codon. However, the most frequent codons do not always correspond with their cognates in the mitochondrial tRNAs (see Additional file 4). It appears that the recognition of the cognates between the anticodons and the codons is not related to translation efficiency. Furthermore, the most infrequently used fourfold degenerate codons are NNG, which is because in the oxidative conditions of the mitochondria, the low content of the less stable G acts on the base composition of the mRNAs to preserve their life span, rather than the DNA . Overall, the most frequently used codons are NNA, which could be explained by the hypothesis put forward by Xia (1996) that ATP is much more abundant than the other three ribonucleotides in mitochondria, and the most frequently used ribonucleotide at the third codon positions in the mRNA should be the same as the most abundant ribonucleotide in the cellular matrix where the mRNA is transcribed .
Based on the DNA sequences, most of the tRNAs could be folded as cloverleaf secondary structures. A few of them possessed non-Watson-Crick matches, aberrant loops, or even extremely short arms. Most of the tRNA-S (GCT) lost their DHU arms, which is not rare in insect mt-genomes. In some cases, the alternative folding could yield a structure with an extremely short DHU stem and a small loop. It has been supposed that these characteristics in mt-genome are partly because the mtDNA is not subject to the process of recombination, which may facilitate the elimination of deleterious mutations . However, recombination in insect mtDNA has been observed . It is not known whether the aberrant tRNAs lose their function in every case, but there are reports of the recruitment of nuclear tRNAs into the mitochondria [60, 61], and a type of RNA editing could recover the well-paired acceptor stem .
Since it is impossible to precisely deTermine the ends of rRNA genes by DNA sequence alone, they were assumed to extend to the boundaries of flanking genes [3, 54]. So the lrRNA was assumed to fill up the blanks between tRNA-V and tRNA-L (TAG). For the boundary between the srRNA gene and the non-coding putative control region, alignments with homologous sequences in other insect mt-genomes were applied to define the 5'-end of the gene [11, 63, 64].
Sizes of the complete mt-genomes of Heteroptera range from 14935 bp (Largidae) to 17191 bp (Alydidae) (Table 1). Total length of tRNAs, rRNAs, and PCGs of each mt-genome does not vary too much in this study, ranging from 14399 bp (Malcidae) to 14687 bp (Aradidae) (Table 4). So the variety of mt-genome size is mainly derived from the non-coding regions. Among these mt-genomes, some large intergenic spacers (the control regions located between srRNA and tRNA-I are not included) could be folded as stem-loop structures which may be potential control elements, and several short ORFs could be found with no significant similarity to any known insect proteins. Since mt-genomes without these "control elements" or "proteins" could still preserve their life, these spacers were supposed to be vestiges of their common or their own ancestral inTermediates generated by gene duplication-random loss process of rearrangement, so were the other short spacers.
As summarized by Cook (2005), the putative control regions of the arthropods often have any or all of these four motifs: a long sequence of thymines, tandemly repeated sequences, a subregion of even higher A+T content, and stem-loop structures . Among the 11 complete mt-genomes of heteropterans, lengths of the control regions range from 224 bp (Largidae) to 2400 bp (Alydidae) (Table 4). Besides the stretches of thymines in all of them, only some short tandemly repeated sequences could be found in several of the mt-genomes, such as the (TA)6 in Saldidae, the (TTGA)3 in Aradidae, the (AAATC)2 in Coreidae, and the (TTAA)3 twice separately in Malcidae. Interestingly, Alydidae, whose control region is the longest one in this study, had only several short tandemly repeated sequences no longer than 11 bp (see Additional file 5). In Reduviidae , Berytidae, Largidae, Pentatomidae, Plataspidae, and Pyrrhocoridae, several repetitions larger than 100 bp have been found. All of the large repetitions and some large non-repetitions could be folded into stem-loop structures. However, no CSB has been found among these control regions. Because the control regions show extreme variability across the taxa and even among closely related species, primary sequence of much of the control region does not appear to be particularly important for regulatory function . Among these 11 complete heteropteran mt-genomes, the control regions are not always the most AT-rich regions (Table 1). Thus the common alternative Term for the control region, the AT-rich region, should probably not be used even though each control region had one or several more AT-rich subregions. Interestingly, in most cases the large tandemly repeated sequences are more AT-rich but not always the most, especially in Plataspidae, where large tandem repetitions consisting of (117 bp)4 are the most AT-rich subregions (calculated from J-strand) (see Additional file 6).
In this study, the data sets of mt-genomes in Heteroptera were enlarged. These true bugs experienced a high rate of the genome reorganization; translocations of tRNAs were found in Aradidae, Pyrrhocoridae, and Largidae, recombination events were observed in Alydidae and Malcidae, and a tetranucleotide ATCA was inferred to be the initiation codon of ND2 in Cydnidae. So, considering the mt-genomic sequences developed in previous studies, the hemipteroid insects are extremely complex groups exhibiting great mt-genomic diversity and these insects are well worth further study.
In infraorder Pentatomomorpha, phylogenetic analyses based on nucleotide sequences favor the monophyly of each superfamily (the monophyly of Aradoidea has been generally accepted though it has not been analyzed in this study) and raise a robust hypothesis of (Aradoidea + (Pentatomoidea + (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)))). Rearrangement of tRNA-T and tRNA-P also favor the monophyly of Pyrrhocoroidea. Furthermore, the CSB in the intergenic spacers between tRNA-H and ND4 favor the monophyly of Lygaeoidea.
Nucleotide sequence of mt-genome is a powerful molecular tool for resolving insect phylogenetic relationships at the level of the superfamily and family. However, use of gene arrangement for phylogenetic analysis in such hierarchies is limited because of its rarity. In future studies, larger taxa sampling is needed to establish the feasibility of using gene arrangement for phylogenetic analysis.
Gene rearrangements are not correlated with nucleotide substitution rates. Different evolutionary patterns among the genes are derived from the variation of the CG content. CO1 is the slowest evolving protein and ATP8 is the fastest one. For ND1, in many polar or nonpolar regions, the specific identify of amino acid residues may be more important than maintaining the polarity of these regions, while the opposite is true for ND6. No CSB has been found among the control regions and many parts of the control regions do not appear to be particularly important for regulatory functions. The use of the Term "AT-rich region" to describe the control region should be reconsidered, because this region is not always the most AT-rich part of the mt-genome.
Thirteen exemplar species were sampled in the infraorder Pentatomomorpha, which represented 13 families of Pentatomomorpha and one species of Leptopodomorpha, and two species (one from GenBank) of Cimicomorpha (Table 1) were chosen as the outgroups according to Wheeler et al (1993)  and Schuh and Slater (1995) .
Adult specimens used in this study were collected in Yunnan Province, Guizhou Province and Tianjin City, China, in recent years. All specimens were preserved in 95% ethanol in the field. After transport to the laboratory, they were stored at -20°C until DNA extraction.
Total genomic DNA was extracted from thorax muscle tissue using a CTAB-based method . The mt-genomes were amplified with two to four overlapped fragments. The primers of co1F, cytbF, and cytbR were modified from Simon et al. (1994) . The other primers were then designed from the sequenced fragments of the CO1 and the Cytb amplified with co1F & co1R and cytbF & cytbR or from the comparison results of the CO1, the CO2, and the Cytb of other insects. Each pair of primers special to each species to amplify two long fragments were named with the first three letters of the family name (see Additional file 7). PCR reactions were performed with a TaKaRa LA PCR Kit Ver.2.1 according to the manufacturer's recommendations. PCR thermal cycling included 1 minute initial denaturation at 94°C, 32 cycles of 20 seconds at 94°C, 1 minute at 50°C~60°C, and 1~10 minutes at 68°C, and a final elongation for 15 minutes at 72°C. The PCR products were electrophoresed in 0.7% agarose gel and purified, then sequenced on both strands with the primer walking method by Beijing Sunbiotech Co. Ltd.
Sequence assembly was done using BioEdit version 7.0 . Protein coding genes (PCGs) were identified by ORF Finder implemented by the NCBI website http://www.ncbi.nlm.nih.gov/gorf/gorf.html with the invertebrate mitochondrial genetic codes and comparison with the published insect mitochondrial sequences with CLUSTAL X version 1.83 . Ribosomal RNA genes were assumed to extend to the boundaries of flanking genes and were also compared with the published insect mitochondrial sequences with CLUSTAL X. Transfer RNA analysis was conducted using tRNAscan-SE version 1.21  with the invertebrate mitochondrial codon predictors, and the tRNAs not detected by tRNAscan-SE were identified by BLAST search and comparison with the other deTermined heteropterans in this study. Analyses of sequences were performed with Phyltest , MEGA version 4.0 , DnaSP version 4.10.9  and DAMBE version 4.2.13 [74, 75].
Complete sequences of each gene were used in the phylogenetic analyses except the stop codons of the protein coding genes. All PCGs were aligned based on amino acid sequences aligned with MEGA, rRNAs and tRNAs were aligned with CLUSTAL X under the default settings. The ambiguously aligned regions of PCGs and rRNA genes were carefully adjusted "by eye". The alignments of tRNA genes were corrected according to the secondary structures, especially the stem regions. Then the aligned sequences were concatenated to be the N123RT data set including 2 rRNAs, 22 tRNAs and 13 PCGs. This data set was used to reconstruct the phylogenetic trees by ML and Bayesian analyses respectively. In both of the algorithms, the general reversible model (GTR + I + G) and the parameters optimized by Modeltest version 3.7  were used. ML analyses were conducted with PAUP* version 4.0b10 . Bayesian analyses were conducted with MrBayes version 3.1.1 , with two simultaneous runs for 1,000,000 generations with the first 25% discarded as burn-in. Another data set including only the 1st and 2nd codon positions of PCGs (the N12 data set) was also used to reconstruct the phylogeny by the same methods.
CO2, and CO3: cytochrome c oxidase subunit I, II, and III genes
cytochrome b gene
ATP synthase F0 subunit 6 and 8 genes
ND2, ND3, ND4, ND4L, ND5, ND6: NADH dehydrogenase subunit 1–6 and 4L genes
transfer RNA genes were labeled according to the IUPAC-IUB single letter code for the specified amino acidin cases where there was more than one tRNA for a particular amino acid, they were distinguished by their anticodons.
This project was supported by the Natural Science Foundation of China (No. 30725005 and No. 30570192), by the National Education Project in Basic Science for Special Subjects (Insect Systematics, No. J0630963), and by the Ministry of Education of China (No. 20050055027).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the Terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.