Comparative and phylogenomic studies on the mitochondrial genomes of Pentatomomorpha (Insecta: Hemiptera: Heteroptera)
© Hua et al; licensee BioMed Central Ltd. 2008
Received: 17 July 2008
Accepted: 17 December 2008
Published: 17 December 2008
Nucleotide sequences and the gene arrangements of mitochondrial genomes are effective tools for resolving phylogenetic problems. Hemipteroid insects are known to possess highly reorganized mitochondrial genomes, but in the suborder Heteroptera (Insecta: Hemiptera), there was only one complete mitochondrial genome sequenced without gene rearrangement and the phylogeny of infraorder Pentatomomorpha in Heteroptera was still uncertain.
Fifteen mitochondrial genomes of the suborder Heteroptera were sequenced. Gene rearrangements were found as follows: 1) tRNA-I and tRNA-Q switched positions in Aradidae, 2) tRNA-T and tRNA-P switched positions in Largidae and Pyrrhocoridae. Two recombination events were found in Alydidae and Malcidae. The other mt-genomes were organized in the same way as observed in Drosophila yakuba. The phylogenetic analyses of infraorder Pentatomomorpha based on the nucleotide sequence raised the hypothesis of (Aradoidea + (Pentatomoidea + (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)))). The rearrangement of tRNA-T and tRNA-P also linked Largidae and Pyrrhocoridae together. Furthermore, the conserved sequence block in the unusual intergenic spacers between tRNA-H and ND4 favored the monophyly of Lygaeoidea. Tetranucleotide ATCA was inferred to be the initiation codon of ND2 in Cydnidae. No correlation was found between the rates of nucleotide substitution and gene rearrangement. CG content was significantly correlated with the nucleotide substitution rate of each gene. For ND1, there was a positive correlation (P < 0.01) between amino acids variations and hydrophobicity, but a negative correlation (P < 0.01) for ND6. No conserved sequence was found among the control regions and these regions were not always the most AT-rich region of the mt-genome.
Heteropteran insects are extremely complex groups worthy of further study because of the unusual tetranucleotide initiation codon and their great mt-genomic diversity, including gene rearrangements and recombinations. The mt-genome is a powerful molecular marker for resolving phylogeny at the level of the superfamily and family. Gene rearrangements were not correlated with nucleotide substitution rates. CG content variation caused the different evolutionary patterns among genes. For ND1, in many polar or nonpolar regions the specific identity of the amino acid residues might be more important than maintaining the polarity of these regions, while the opposite is true for ND6. Most sequences of the control regions did not appear to be important for regulatory functions. Finally, we suggest that the term "AT-rich regions" should not be used.
To date, more than one hundred complete or nearly complete insect mitochondrial genomes (mt-genomes) and thousands of partial sequences are available in GenBank/EMBL/DDBJ, and the data set is expanding at an increasing pace. Most animal mt-genomes contain the same 37 genes, but the orders of the genes are variable among and within the major groups . The mt-genome is becoming an effective tool to resolve phylogenetic problems due to its unique properties, including maternal inheritance, orthologous genes, and lack of the substantial intermolecular recombination . Patterns of mitochondrial gene rearrangements are also indicators of phylogenetic relationships [3–5]
In Insecta, mitochondrial gene rearrangements have been found in 11 orders (including some unpublished data) with the others exhibiting the same gene order observed in Drosophila yakuba [6, 7]. It has been found that the hemipteroid insects (including four orders, Hemiptera, Thysanoptera, Psocoptera, and Phthiraptera) have experienced increased rates of gene rearrangements in the mt-genomes [8–12]. In the order Phthiraptera, the three sequenced mt-genomes [GenBank: NC002651, GenBank: NC009983, GenBank: NC007884] show different gene arrangements. Within the suborder Sternorrhyncha of order Hemiptera, six of the eight sequenced mt-genomes [GenBank: NC005939, GenBank: NC006159, GenBank: NC006160, GenBank: NC006292, GenBank: NC006279, GenBank: NC006280] experienced gene rearrangements and are different from each other. But within the suborder Heteroptera of order Hemiptera, the gene order of the only sequenced mt-genome, Triatoma dimidiata (Latreille, 1811) (Reduviidae) [GenBank: NC002609], is identical to that of Drosophila yakuba [GenBank: NC001322].
It has been proposed that such gene rearrangements as synapomorphies would not contribute much to resolving insect interordinal relationships, while using gene order to understand intraordinal relationships would be useful . We undertook to examine whether the gene arrangements are conservative in suborder Heteroptera, especially in the infraorder Pentatomomorpha, to elucidate the phylogenetic relationships within this group and determine whether the gene order could resolve the phylogenetic relationships at the superfamily level.
Pentatomomorpha is one of six infraorders of Heteroptera . It consists chiefly of terrestrial phytophagous bugs. Many of these bugs feed on seeds and plant vascular systems and some feed on fungi. This group of insects is economically important in agriculture and forestry, making robust phylogenetic hypotheses for the main lineages of significant interest. Views on the systematics of Pentatomomorpha before 1993 were summarized by Schaefer (1993) . Currently the classification system of Pentatomomorpha includes five superfamilies, Aradoidea, Pentatomoidea, Coreoidea, Lygaeoidea and Pyrrhocoroidea (the superfamilies except the Aradoidea are grouped as Trichophora) . There are different views of the phylogenetic relationships within Pentatomomorpha based on morphological or molecular evidence. Using morphological evidence, Schaefer (1993) argued that Aradoidea may be either the sister group of Trichophora or Cimicomorpha-cum-Pentatomomorpha . Sweet (1996) proposed that the Aradoidea constituted a separate heteropteran infraorder, Aradomorpha . Henry (1997) suggested the relationship (Aradoidea + (Pentatomoidea + ((Coreoidea + Pyrrhocoroidea) + (Idiostoloidea + Lygaeoidea)))) . Using molecular evidence, Xie et al (2005) performed Bayesian analysis with the 18S rDNA data set on the main lineages of Trichophora, giving the hypothesis of (Pentatomoidea + (Pyrrhocoroidea + (Coreoidea + Lygaeoidea))) and raised the taxon Eutrichophora (including the superfamilies of Trichophora except Pentatomoidea) and Neotrichophora (including Coreoidea and Lygaeoidea), but the relationships within the Neotrichophora were not completely resolved . Li et al. (2005) proposed the hypothesis that (Aradoidea + (Pentatomoidea + ("Pyrrhocoroidea"+ "Coreoidea" + "Lygaeoidea"))) based on the partial sequence of the CO1 and 18S rDNA . Li et al. (2005), however, found that the superfamilies Pyrrhocoroidea, Coreoidea, and Lygaeoidea were not monophyletic. However, Li et al. (2005) included several low quality sequences with many ambiguous nucleotides and a lack of subregions, and two CO1-like genes were used instead of the CO1 gene; also, many nodes in the cladogram appeared with bootstrap values lower than 50%. Additionally, Wheeler et al (1993) supported the monophyly of Pentatomomorpha and a sister group relationship between Aradoidea and Trichophora based on evidence combining morphological characters and partial 18S rDNA .
To date, the hypothesis (Aradoidea + (Pentatomoidea + the remainder of Trichophora)) has been accepted by most researchers, based on morphological and molecular evidence [17–19, 21]. Within Pentatomomorpha, especially the Eutrichophora, the phylogenetic relationships among the superfamilies are still controversial.
For a robust phylogenetic hypothesis of the higher-level relationships in any organism, multiple genes and large sequence lengths are favored [22–24]. However, the DNA sequences that have been used for phylogenetic analyses of Pentatomomorpha are no longer than 1.6 kb. Thus we hoped to resolve the phylogeny of Pentatomomorpha with the mt-genomes, which are the largest set of homologous genes that can be conveniently compared across animal taxa . No bioinformatic studies on the mt-genomes of Pentatomomorpha have been done.
Recombination has been considered to be absent in the animal mt-genome , but recombination has been observed in insect mtDNA . There are different views on whether the nucleotide substitution rate correlates with gene rearrangements [27–29].
We sequenced the mt-genomes of 15 heteropterans in this study with five main aims: 1) to explore the organization of their mt-genomes and infer possible evolutionary mechanisms; 2) to find out whether there is recombination in heteropteran mt-genomes; 3) to investigate the phylogenetic relationships among the superfamilies of Pentatomomorpha; 4) to identify the correlation between the nucleotide substitution rate and gene rearrangement; 5) to describe the bioinformatics of heteropteran mt-genomes.
Results and discussion
General informatics of the mt-genomes in this study.
Total A+T %
AT% of control region
Saldula arsenjevi Vinokurov
Triatoma dimidiata (Latreille)
Orius niger Wolff
tRNA(I) - 12S
Neuroctenus parus Hsiao
Nezara viridula (Linnaeus)
Macroscytus subaeneus (Dallas)
tRNA(I) - 12S
Coptosoma bifaria Montandon
Yemmalysus parallelus Stusak
Phaenacantha marcida Horvath
tRNA(I) - 12S
Malcus inconspicuus Stys
Geocoris pallidipennis (Costa)
tRNA(I) - 12S
Physopelta gutta (Burmeister)
Dysdercus cingulatus (Fabricius)
Riptortus pedestris Fabricius
Hydaropsis longirostris (Hsiao)
Aeschyntelus notatus Hsiao
tRNA(I) - 12S
Within these genomes, all of the 37 genes commonly found in metazoan mt-genomes were determined: 2 for rRNAs, 22 for tRNAs, and 13 for proteins. Most of these genes are arranged in the same order as the putative ancestral insect mt-genome , with a few exceptions discussed later. We also found both intergenic spacers and genes that overlapped by several nucleotides. For discussion purposes, the strand containing the CO1 gene was designated as the major-strand (J-strand) because it encoded 23 genes. Consequently the other strand encoding the remaining 14 genes was designated as the minor-strand (N-strand).
In type B, Aradidae, the tRNA-Q translocated from the inferred ancestral position (tRNA-I-and-tRNA-Q-and-tRNA-M) to upstream of the tRNA-I with a 44 nt spacer which could be folded as a stem-loop structure between them.
In most of the sequenced mt-genomes of heteropterans, tRNA-T and tRNA-P form a concatenated unit and locate upstream to ND6 gene. But in Type C, Pyrrhocoroidea (including Largidae and Pyrrhocoridae), tRNA-T and tRNA-P switched their positions with two extra intergenic spacers in different locations. In Largidae, the spacers are 19 nt between ND4L and tRNA-P, and 240 nt between tRNA-T and ND6. In Pyrrhocoridae, the spacers are 33 nt between ND4L and tRNA-P, and 60 nt between tRNA-P and tRNA-T.
Furthermore, the intergenic spacer between tRNA-S (TGA) and ND1 occurs in all the mt-genomes except Pyrrhocoridae. All these spacers are no longer than 30 nt except Reduviidae which is longer than 300 nt. Some of these spacers may be folded as stem-loop structures, and in Reduviidae this spacer contains an unidentified open reading frame of 312 nt . Among these spacers, there is a CSB (Figure 2B), which might indicate that the extant heteropterans underwent a common intermediate stage of the TDRL process, i.e., this spacer might be a synapomorphy of heteropterans.
Possible evolutionary mechanisms of gene rearrangement
Recombination of mitochondrial DNA
In Malcidae, one subregion of the intergenic spacer between the tRNA-H and the ND4 has an exactly repeated counterpart in the control region (34 nt, Blast E-value: 2e-15) (see Additional file 1), and this subregion is also part of the CSB in Lygaeoidea. Considering that this intergenic spacer is shared among the members of Lygaeoidea, in Malcidae the copy of the repetition located between tRNA-H and ND4 is supposed to be the original one. This phenomenon could also be elucidated by the TDRL model wherein a large fragment from the intergenic spacer to the control region duplicated and then most of them were lost except the repetition. However, across the whole mt-genome, no vestige of such a large duplication was found. The possible alternative mechanism is recombination. It has been reported that the secondary structures formed by the tRNA genes or the self complementary sequences may play a major role as the hotspots in recombination [36, 37], and the control region is supposed to be a possible recombination hotspot . Interestingly, this intergenic spacer in Malcidae is capable of forming a stem-loop structure (see Additional file 1) adjacent to tRNA-H, and its repetition in the stem-loop-rich control region is adjacent to tRNA-I. So, we speculated that this duplication might be the product of illegitimate recombination via a mini-circle (Figure 4B). However, in Berytidae, no copy of this spacer was found across the mt-genomes. Further study is needed to investigate whether this repetition is the autapomorphy of Malcidae, since the control regions of Geocoridae and Colobathristidae could not be sequenced because of poly A and poly T.
Use of the tRNAs in phylogenetic analysis
The transfer RNA gene recruitment or the duplication and the remolding make it hard to identify the homologs for these short genes and could also affect the use of mitochondrial gene orders for the phylogenetic reconstructions [38, 39]. It has also been proposed that the tRNAs are much less homoplastic than the PCGs and should be included in the phylogenetic analyses . Thus, it is necessary to understand whether the tRNA genes are homologs before the phylogenetic analyses.
For the 22 tRNAs, each type holds the same anticodon respectively across the mt-genomes in this study. We aligned each type of tRNA and adjusted them according to their secondary structures (the alignments are the same as those used in phylogenetic analyses), and then the pairwise similarities within each type of tRNA were calculated by pairwise alignment using BioEdit (see Additional file 2). High similarity values could found within each type of the tRNAs, indicating that no tRNA recruitment or remolding [38, 39] happened and the rearrangements of the tRNAs in Aradidae, Largidae, and Pyrrhocoridae should be the transpositions of the tRNA coding sequences. Finally, we determined that each group of tRNAs used in the phylogenetic analyses are homologous.
Phylogeny of Pentatomomorpha
Our results also confirm that Aradoidea and the Trichophora are sister groups as indicated in previous studies [17–19, 21]. Hence, we argue that there is no necessity to raise Aradoidea from a superfamily to an infraorder, because the seven-infraorder classification of the Heteroptera has been accepted by most researchers [15, 20]. The sister groups' relationship of Pentatomoidea and the remainder of the Trichophora raised in this study also conforms to the morphological and nuclear rDNA data [17, 18]. In Eutrichophora, our study raises a phylogenetic hypothesis of (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)). This result is different from Henry (1997)  but similar to Xie et al (2005) . Since this study did not include all the families of each superfamily, family relationships within each superfamily, remain incomplete. The Rhophalidae presents a sister position to the (Coreidae+Alydidae) and the monophyly of the (Coreidae+Alydidae) is also accepted by Henry (1997)  and Xie et al (2005) . In Lygaeoidea, our conclusion is (Colobathristidae + (Berytidae + (Geocoridae + Malcidae))), which is different from Henry (1997) , whose hypothesis could be summarized as (Geocoridae + (Malcidae + (Colobathristidae + Berytidae))). The monophyletic malcid line [17, 40] is broken down in our study because of the interposition of Geocoridae.
Surprisingly, the infraorder Cimicomorpha (including Reduviidae, Anthocoridae, and other families not included in this study), whose monophyly has been accepted [20, 41], was not monophyletic in this study. The Bayesian posterior probability value for the node (Anthocoridae + Pentatomomorpha) is 1.00. The mt-genome data in this study may be too limited to resolve the phylogeny of Cimicomorpha, and more mt-genomes of the additional taxa of Cimicomorpha need to be sequenced because increased taxon sampling could greatly reduce the phylogenetic error , or possibly the monophyly of Cimicomorpha needs to be revised.
Additionally, in Lygaeoidea the CSB in the intergenic spacers between tRNA-H and ND4 indicates that the members of Lygaeoidea have undergone a common intermediate TDRL and this spacer is the synapomorphy of Lygaeoidea favoring its monophyly. Furthermore, the position switch of tRNA-T and tRNA-P also links Largidae and Pyrrhocoridae together.
Correlation between rates of nucleotide substitution and gene rearrangement
Relative nucleotide substitution rate test among the lineages of Pentatomomorpha.
Nucleotide composition of the tRNA-T and the tRNA-P.
In general, the groups whose genes rearranged do not always exhibit significantly different substitution rates. So, there is no correlation between the rates of nucleotide substitution and gene rearrangement.
Different evolutionary patterns among genes
The nucleotide substitution rates averaged over all sequence pairs of each gene vary among genes (see Additional file 3A). The overall mean value of this ratio of each pair of PCGs is 0.639, while for the rRNAs it is 0.384. Furthermore, the concatenated 22 tRNAs present a lower ratio (0.241) than PCGs and rRNAs, as reported before . This is a reasonable result because the purifying selection constrains nucleotide divergence in the rRNAs and the tRNAs due to the secondary structures of these molecules and the specificity of their amino acid binding sites during the translation . ND6 presents the highest nucleotide substitution rate, while CO1 appears to be the lowest. Surprisingly, CO2, which is considered as one of the slowest evolving genes , presents a higher nucleotide substitution ratio than most of the PCGs. As shown in Additional file 3A, the number of synonymous substitutions per synonymous site of CO2 is the highest, but its number of nonsynonymous substitutions per nonsynonymous site is much lower, while this value for ATP8 is the highest. As far as the PCGs are concerned, at the amino acid level the cytochrome oxidase subunits (CO1, CO2, and CO3) and cytochrome b are the slowest evolving proteins and ATP8 is the fastest. The nucleotide substitution rates of the first and second codon positions of PCGs are always lower but more variable within each gene than the third codon positions (see Additional file 3B), which is most likely because of the different selective constraints on the different genes . Obviously, for PCGs the ratios of the nonsynonymous nucleotide changes (Ka) versus the synonymous nucleotide changes (Ks) are all below 1, indicating that they are evolving according to the purifying selection .
Negative correlations have been found between the nucleotide substitution rate and the CG content of each gene (P < 0.01), and between the Ka/Ks and the CG content of each PCG (P < 0.01), which indicate that the variation of CG content causes the different evolutionary patterns among genes. For ND1, there is positive correlation (P < 0.01) between the variations and the hydrophobicity of the amino acids, but negative correlation (P < 0.01) for ND6, indicating that, in many polar or nonpolar regions, the specific identify of amino acid residues may be more important than maintaining the polarity of these regions of ND1, while the opposite is true for ND6, as reported in other organisms .
Codon usage and nucleotide composition
Some PCGs do not possess complete stop codons but appear to be terminated with a single T adjacent to a downstream tRNA gene on the same strand (data not shown, see the GenBank files under the Accession Numbers in Table 1). This is not peculiar the in mt-genome and a TAA stop codon is probably generated by posttranscriptional polyadenylation . RNase P has been found to take on some aspects of the structural conformation of the tRNAs for the recognition signals of the cleavage sites in the tRNA precursors, rather than the nucleotide sequences . Thus, we supposed that the site right before tRNA should be the cleavage site. However, the complete common stop codons always overlap several nucleotides within the downstream tRNA, which perhaps act as a "backup" to prevent the translational read through if the transcripts are not properly cleaved .
Nucleotide composition of the mt-genomes in this study
Length of total genes (bp)
Length of control region (bp)
PCGs of J-strand
PCGs of N-strand
Because there are only 22 tRNA genes in one mt-genome, one tRNA must recognize more than one type of codon. However, the most frequent codons do not always correspond with their cognates in the mitochondrial tRNAs (see Additional file 4). It appears that the recognition of the cognates between the anticodons and the codons is not related to translation efficiency. Furthermore, the most infrequently used fourfold degenerate codons are NNG, which is because in the oxidative conditions of the mitochondria, the low content of the less stable G acts on the base composition of the mRNAs to preserve their life span, rather than the DNA . Overall, the most frequently used codons are NNA, which could be explained by the hypothesis put forward by Xia (1996) that ATP is much more abundant than the other three ribonucleotides in mitochondria, and the most frequently used ribonucleotide at the third codon positions in the mRNA should be the same as the most abundant ribonucleotide in the cellular matrix where the mRNA is transcribed .
tRNAs and rRNAs
Based on the DNA sequences, most of the tRNAs could be folded as cloverleaf secondary structures. A few of them possessed non-Watson-Crick matches, aberrant loops, or even extremely short arms. Most of the tRNA-S (GCT) lost their DHU arms, which is not rare in insect mt-genomes. In some cases, the alternative folding could yield a structure with an extremely short DHU stem and a small loop. It has been supposed that these characteristics in mt-genome are partly because the mtDNA is not subject to the process of recombination, which may facilitate the elimination of deleterious mutations . However, recombination in insect mtDNA has been observed . It is not known whether the aberrant tRNAs lose their function in every case, but there are reports of the recruitment of nuclear tRNAs into the mitochondria [60, 61], and a type of RNA editing could recover the well-paired acceptor stem .
Since it is impossible to precisely determine the ends of rRNA genes by DNA sequence alone, they were assumed to extend to the boundaries of flanking genes [3, 54]. So the lrRNA was assumed to fill up the blanks between tRNA-V and tRNA-L (TAG). For the boundary between the srRNA gene and the non-coding putative control region, alignments with homologous sequences in other insect mt-genomes were applied to define the 5'-end of the gene [11, 63, 64].
Sizes of the complete mt-genomes of Heteroptera range from 14935 bp (Largidae) to 17191 bp (Alydidae) (Table 1). Total length of tRNAs, rRNAs, and PCGs of each mt-genome does not vary too much in this study, ranging from 14399 bp (Malcidae) to 14687 bp (Aradidae) (Table 4). So the variety of mt-genome size is mainly derived from the non-coding regions. Among these mt-genomes, some large intergenic spacers (the control regions located between srRNA and tRNA-I are not included) could be folded as stem-loop structures which may be potential control elements, and several short ORFs could be found with no significant similarity to any known insect proteins. Since mt-genomes without these "control elements" or "proteins" could still preserve their life, these spacers were supposed to be vestiges of their common or their own ancestral intermediates generated by gene duplication-random loss process of rearrangement, so were the other short spacers.
As summarized by Cook (2005), the putative control regions of the arthropods often have any or all of these four motifs: a long sequence of thymines, tandemly repeated sequences, a subregion of even higher A+T content, and stem-loop structures . Among the 11 complete mt-genomes of heteropterans, lengths of the control regions range from 224 bp (Largidae) to 2400 bp (Alydidae) (Table 4). Besides the stretches of thymines in all of them, only some short tandemly repeated sequences could be found in several of the mt-genomes, such as the (TA)6 in Saldidae, the (TTGA)3 in Aradidae, the (AAATC)2 in Coreidae, and the (TTAA)3 twice separately in Malcidae. Interestingly, Alydidae, whose control region is the longest one in this study, had only several short tandemly repeated sequences no longer than 11 bp (see Additional file 5). In Reduviidae , Berytidae, Largidae, Pentatomidae, Plataspidae, and Pyrrhocoridae, several repetitions larger than 100 bp have been found. All of the large repetitions and some large non-repetitions could be folded into stem-loop structures. However, no CSB has been found among these control regions. Because the control regions show extreme variability across the taxa and even among closely related species, primary sequence of much of the control region does not appear to be particularly important for regulatory function . Among these 11 complete heteropteran mt-genomes, the control regions are not always the most AT-rich regions (Table 1). Thus the common alternative term for the control region, the AT-rich region, should probably not be used even though each control region had one or several more AT-rich subregions. Interestingly, in most cases the large tandemly repeated sequences are more AT-rich but not always the most, especially in Plataspidae, where large tandem repetitions consisting of (117 bp)4 are the most AT-rich subregions (calculated from J-strand) (see Additional file 6).
In this study, the data sets of mt-genomes in Heteroptera were enlarged. These true bugs experienced a high rate of the genome reorganization; translocations of tRNAs were found in Aradidae, Pyrrhocoridae, and Largidae, recombination events were observed in Alydidae and Malcidae, and a tetranucleotide ATCA was inferred to be the initiation codon of ND2 in Cydnidae. So, considering the mt-genomic sequences developed in previous studies, the hemipteroid insects are extremely complex groups exhibiting great mt-genomic diversity and these insects are well worth further study.
In infraorder Pentatomomorpha, phylogenetic analyses based on nucleotide sequences favor the monophyly of each superfamily (the monophyly of Aradoidea has been generally accepted though it has not been analyzed in this study) and raise a robust hypothesis of (Aradoidea + (Pentatomoidea + (Pyrrhocoroidea + (Lygaeoidea + Coreoidea)))). Rearrangement of tRNA-T and tRNA-P also favor the monophyly of Pyrrhocoroidea. Furthermore, the CSB in the intergenic spacers between tRNA-H and ND4 favor the monophyly of Lygaeoidea.
Nucleotide sequence of mt-genome is a powerful molecular tool for resolving insect phylogenetic relationships at the level of the superfamily and family. However, use of gene arrangement for phylogenetic analysis in such hierarchies is limited because of its rarity. In future studies, larger taxa sampling is needed to establish the feasibility of using gene arrangement for phylogenetic analysis.
Gene rearrangements are not correlated with nucleotide substitution rates. Different evolutionary patterns among the genes are derived from the variation of the CG content. CO1 is the slowest evolving protein and ATP8 is the fastest one. For ND1, in many polar or nonpolar regions, the specific identify of amino acid residues may be more important than maintaining the polarity of these regions, while the opposite is true for ND6. No CSB has been found among the control regions and many parts of the control regions do not appear to be particularly important for regulatory functions. The use of the term "AT-rich region" to describe the control region should be reconsidered, because this region is not always the most AT-rich part of the mt-genome.
Materials and methods
Thirteen exemplar species were sampled in the infraorder Pentatomomorpha, which represented 13 families of Pentatomomorpha and one species of Leptopodomorpha, and two species (one from GenBank) of Cimicomorpha (Table 1) were chosen as the outgroups according to Wheeler et al (1993)  and Schuh and Slater (1995) .
Adult specimens used in this study were collected in Yunnan Province, Guizhou Province and Tianjin City, China, in recent years. All specimens were preserved in 95% ethanol in the field. After transport to the laboratory, they were stored at -20°C until DNA extraction.
Mitochondrial DNA preparation, amplification and sequencing
Total genomic DNA was extracted from thorax muscle tissue using a CTAB-based method . The mt-genomes were amplified with two to four overlapped fragments. The primers of co1F, cytbF, and cytbR were modified from Simon et al. (1994) . The other primers were then designed from the sequenced fragments of the CO1 and the Cytb amplified with co1F & co1R and cytbF & cytbR or from the comparison results of the CO1, the CO2, and the Cytb of other insects. Each pair of primers special to each species to amplify two long fragments were named with the first three letters of the family name (see Additional file 7). PCR reactions were performed with a TaKaRa LA PCR Kit Ver.2.1 according to the manufacturer's recommendations. PCR thermal cycling included 1 minute initial denaturation at 94°C, 32 cycles of 20 seconds at 94°C, 1 minute at 50°C~60°C, and 1~10 minutes at 68°C, and a final elongation for 15 minutes at 72°C. The PCR products were electrophoresed in 0.7% agarose gel and purified, then sequenced on both strands with the primer walking method by Beijing Sunbiotech Co. Ltd.
Sequence assembly and analysis
Sequence assembly was done using BioEdit version 7.0 . Protein coding genes (PCGs) were identified by ORF Finder implemented by the NCBI website http://www.ncbi.nlm.nih.gov/gorf/gorf.html with the invertebrate mitochondrial genetic codes and comparison with the published insect mitochondrial sequences with CLUSTAL X version 1.83 . Ribosomal RNA genes were assumed to extend to the boundaries of flanking genes and were also compared with the published insect mitochondrial sequences with CLUSTAL X. Transfer RNA analysis was conducted using tRNAscan-SE version 1.21  with the invertebrate mitochondrial codon predictors, and the tRNAs not detected by tRNAscan-SE were identified by BLAST search and comparison with the other determined heteropterans in this study. Analyses of sequences were performed with Phyltest , MEGA version 4.0 , DnaSP version 4.10.9  and DAMBE version 4.2.13 [74, 75].
Complete sequences of each gene were used in the phylogenetic analyses except the stop codons of the protein coding genes. All PCGs were aligned based on amino acid sequences aligned with MEGA, rRNAs and tRNAs were aligned with CLUSTAL X under the default settings. The ambiguously aligned regions of PCGs and rRNA genes were carefully adjusted "by eye". The alignments of tRNA genes were corrected according to the secondary structures, especially the stem regions. Then the aligned sequences were concatenated to be the N123RT data set including 2 rRNAs, 22 tRNAs and 13 PCGs. This data set was used to reconstruct the phylogenetic trees by ML and Bayesian analyses respectively. In both of the algorithms, the general reversible model (GTR + I + G) and the parameters optimized by Modeltest version 3.7  were used. ML analyses were conducted with PAUP* version 4.0b10 . Bayesian analyses were conducted with MrBayes version 3.1.1 , with two simultaneous runs for 1,000,000 generations with the first 25% discarded as burn-in. Another data set including only the 1st and 2nd codon positions of PCGs (the N12 data set) was also used to reconstruct the phylogeny by the same methods.
- CO1 :
CO2, and CO3: cytochrome c oxidase subunit I, II, and III genes
- CytB :
cytochrome b gene
- ATP6 and ATP8:
ATP synthase F0 subunit 6 and 8 genes
- ND1 ND2, ND3, ND4, ND4L, ND5, ND6 :
NADH dehydrogenase subunit 1–6 and 4L genes; transfer RNA genes were labeled according to the IUPAC-IUB single letter code for the specified amino acid; in cases where there was more than one tRNA for a particular amino acid; they were distinguished by their anticodons.
This project was supported by the Natural Science Foundation of China (No. 30725005 and No. 30570192), by the National Education Project in Basic Science for Special Subjects (Insect Systematics, No. J0630963), and by the Ministry of Education of China (No. 20050055027).
- San Mauro D, Gower DJ, Zardoya R, Wilkinson M: A hotspot of gene order rearrangement by tandem duplication and random loss in the vertebrate mitochondrial genome. Mol Biol Evol. 2006, 23: 227-234. 10.1093/molbev/msj025.PubMedView ArticleGoogle Scholar
- Dellaporta SL, Xu A, Sagasser S, Jakob W, Moreno MA, Buss LW, Schierwater B: Mitochondrial genome of Trichoplax adhaerens supports Placozoa as the basal lower metazoan phylum. Proc Natl Acad Sci USA. 2006Google Scholar
- Boore JL: Complete mitochondrial genome sequence of the polychaete annelid Platynereis dumerilii. Mol Biol Evol. 2001, 18: 1413-1416.PubMedView ArticleGoogle Scholar
- Boore JL, Brown WM: Big trees from little genomes: mitochondrial gene order as a phylogenetic tool. Curr Opin Genet Dev. 1998, 8: 668-674. 10.1016/S0959-437X(98)80035-X.PubMedView ArticleGoogle Scholar
- Larget B, Simon DL, Kadane JB: Bayesian phylogenetic inference from animal mitochondrial genome arrangements. Journal of the Royal Statistical Society Series B-Statistical Methodology. 2002, 64: 681-693. 10.1111/1467-9868.00356.View ArticleGoogle Scholar
- Cameron SL, Beckenbach AT, Dowton M, Whiting MF: Evidence from mitochondrial genomics on interordinal relationships in insects. Arthropod Systematics & Phylogeny. 2006, 64: 27-34.Google Scholar
- Clary DO, Wolstenholme DR: The mitochondrial DNA molecular of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J Mol Evol. 1985, 22: 252-271. 10.1007/BF02099755.PubMedView ArticleGoogle Scholar
- Covacin C, Shao R, Cameron S, Barker SC: Extraordinary number of gene rearrangements in the mitochondrial genomes of lice (Phthiraptera: Insecta). Insect Mol Biol. 2006, 15: 63-68. 10.1111/j.1365-2583.2005.00608.x.PubMedView ArticleGoogle Scholar
- Shao R, Barker SC: The highly rearranged mitochondrial genome of the plague thrips, Thrips imaginis (Insecta: Thysanoptera): convergence of two novel gene boundaries and an extraordinary arrangement of rRNA genes. Mol Biol Evol. 2003, 20: 362-370. 10.1093/molbev/msg045.PubMedView ArticleGoogle Scholar
- Shao R, Campbell NJ, Barker SC: Numerous gene rearrangements in the mitochondrial genome of the wallaby louse, Heterodoxus macropus (Phthiraptera). Mol Biol Evol. 2001, 18: 858-865.PubMedView ArticleGoogle Scholar
- Shao R, Campbell NJ, Schmidt ER, Barker SC: Increased rate of gene rearrangement in the mitochondrial genomes of three orders of hemipteroid insects. Mol Biol Evol. 2001, 18: 1828-1832.PubMedView ArticleGoogle Scholar
- Thao ML, Baumann L, Baumann P: Organization of the mitochondrial genomes of whiteflies, aphids, and psyllids (Hemiptera, Sternorrhyncha). BMC Evol Biol. 2004, 4: 25-10.1186/1471-2148-4-25.PubMedPubMed CentralView ArticleGoogle Scholar
- Štys P, Kerzhner I: The rank and nomenclature of higher taxa in recent Heteroptera. Acta EntBohemoslov. 1975, 72: 65-79.Google Scholar
- Schaefer CW: The Pentatomomorpha (Hemiptera: Heteroptera): an annotated outline of its systematic history. EurJEntomol. 1993, 90: 105-122.Google Scholar
- Schuh RT, Slater JA: True Bugs of the World (Hemiptera: Heteroptera) Classification and Natural History. 1995, Cornell University Press, Ithaca, New YorkGoogle Scholar
- Sweet MH: The Comparative External Morphology of the Pregenital Abdomen of Hemiptera. Studies on Hemipteran Phylogeny. Edited by: Schaefer CW. 1996, Proceedings, Thomas Say Publications in Entomology, Entomological Society of America, Lanham, MD, 119-158.Google Scholar
- Henry TJ: Phylogenetic analysis of family groups within the infraorder Pentatomomorpha (Hemiptera: Heteroptera), with emphasis on the Lygaeoidea. AnnEntomolSocAm. 1997, 90: 275-301.Google Scholar
- Xie Q, Bu W, Zheng L: The Bayesian phylogenetic analysis of the 18S rRNA sequences from the main lineages of Trichophora (Insecta: Heteroptera: Pentatomomorpha). Mol Phylogenet Evol. 2005, 34: 448-451. 10.1016/j.ympev.2004.10.015.PubMedView ArticleGoogle Scholar
- Li HM, Deng RQ, Wang JW, Chen ZY, Jia FL, Wang XZ: A preliminary phylogeny of the Pentatomomorpha (Hemiptera: Heteroptera) based on nuclear 18S rDNA and mitochondrial DNA sequences. Mol Phylogenet Evol. 2005, 37: 313-326. 10.1016/j.ympev.2005.07.013.PubMedView ArticleGoogle Scholar
- Wheeler WC, Schuh RT, Bang R: Cladistic relationships among higher groups of Heteroptera: congruence between morphological and molecular data sets. EntomolScand. 1993, 24: 121-137.Google Scholar
- Leston D: Chromsome number and the systematics of Pentatomomorpha (Hemiptera). Proceedings of the Tenth International Congress of Entomologey. 1958, 2: 911-918.Google Scholar
- Mattern MY: Molecular phylogeny of the Gasterosteidae: the importance of using multiple genes. Mol Phylogenet Evol. 2004, 30: 366-377. 10.1016/S1055-7903(03)00190-8.PubMedView ArticleGoogle Scholar
- Miya M, Nishida M: Use of mitogenomic information in teleostean molecular phylogenetics: a tree-based exploration under the maximum-parsimony optimality criterion. Mol Phylogenet Evol. 2000, 17: 437-455. 10.1006/mpev.2000.0839.PubMedView ArticleGoogle Scholar
- Wortley AH, Rudall PJ, Harris DJ, Scotland RW: How much data are needed to resolve a difficult phylogeny? Case study in Lamiales. Syst Biol. 2005, 54: 697-709. 10.1080/10635150500221028.PubMedView ArticleGoogle Scholar
- Shadel GS, Clayton DA: Mitochondrial DNA maintenance in vertebrates. Annu Rev Biochem. 1997, 66: 409-435. 10.1146/annurev.biochem.66.1.409.PubMedView ArticleGoogle Scholar
- Tsaousis AD, Martin DP, Ladoukakis ED, Posada D, Zouros E: Widespread recombination in published animal mtDNA sequences. Mol Biol Evol. 2005, 22: 925-933. 10.1093/molbev/msi084.PubMedView ArticleGoogle Scholar
- Kurabayashi A, Sumida M, Yonekawa H, Glaw F, Vences M, Hasegawa M: Phylogeny, recombination, and mechanisms of stepwise mitochondrial genome reorganization in mantellid frogs from Madagascar. Mol Biol Evol. 2008Google Scholar
- Shao R, Dowton M, Murrell A, Barker SC: Rates of gene rearrangement and nucleotide substitution are correlated in the mitochondrial genomes of insects. Mol Biol Evol. 2003, 20: 1612-1619. 10.1093/molbev/msg176.PubMedView ArticleGoogle Scholar
- Xu W, Jameson D, Tang B, Higgs PG: The relationship between the rate of molecular evolution and the rate of genome rearrangement in animal mitochondrial genomes. Journal of Molecular Evolution. 2006, 63: 375-392. 10.1007/s00239-005-0246-5.PubMedView ArticleGoogle Scholar
- Boore JL: Animal mitochondrial genomes. Nucleic Acids Res. 1999, 27: 1767-1780. 10.1093/nar/27.8.1767.PubMedPubMed CentralView ArticleGoogle Scholar
- Gibb GC, Kardailsky O, Kimball RT, Braun EL, Penny D: Mitochondrial genomes and avian phylogeny: complex characters and resolvability without explosive radiations. Mol Biol Evol. 2007, 24: 269-280. 10.1093/molbev/msl158.PubMedView ArticleGoogle Scholar
- Dotson EM, Beard CB: Sequence and organization of the mitochondrial genome of the Chagas disease vector, Triatoma dimidiata. Insect Mol Biol. 2001, 10: 205-215. 10.1046/j.1365-2583.2001.00258.x.PubMedView ArticleGoogle Scholar
- Boore JL, Lavrov DV, Brown WM: Gene translocation links insects and crustaceans. Nature. 1998, 392: 667-668. 10.1038/33577.PubMedView ArticleGoogle Scholar
- Mueller RL, Boore JL: Molecular mechanisms of extensive mitochondrial gene rearrangement in plethodontid salamanders. Mol Biol Evol. 2005, 22: 2104-2112. 10.1093/molbev/msi204.PubMedView ArticleGoogle Scholar
- Dowton M, Austin AD: Evolutionary dynamics of a mitochondrial rearrangement "hot spot" in the Hymenoptera. Mol Biol Evol. 1999, 16: 298-309.PubMedView ArticleGoogle Scholar
- Stanton DJ, Daehler LL, Moritz CC, Brown WM: Sequences with the potential to form stem-and-loop structures are associated with coding-region duplications in animal mitochondrial DNA. Genetics. 1994, 137: 233-241.PubMedPubMed CentralGoogle Scholar
- Yu DJ, Xu L, Nardi F, Li JG, Zhang RJ: The complete nucleotide sequence of the mitochondrial genome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae). Gene. 2007, 396: 66-74. 10.1016/j.gene.2007.02.023.PubMedView ArticleGoogle Scholar
- Lavrov DV, Lang BF: Transfer RNA gene recruitment in mitochondrial DNA. Trends Genet. 2005, 21: 129-133. 10.1016/j.tig.2005.01.004.PubMedView ArticleGoogle Scholar
- Rawlings TA, Collins TM, Bieler R: Changing identities: tRNA duplication and remolding within animal mitochondrial genomes. Proc Natl Acad Sci USA. 2003, 100: 15700-15705. 10.1073/pnas.2535036100.PubMedPubMed CentralView ArticleGoogle Scholar
- Štys P: Monograph of Malcinae, with consideration of morphology and phylogeny of related groups (Heteroptera: Malcidae). Acta Entomol Mus Natl Prag. 1967, 37: 351-516.Google Scholar
- Schuh RT, Štys P: Phylogenetic analysis of cimicomorphan family relationships (Heteroptera). Journal Of The New York Entomological Society. 1991, 99: 98-350.Google Scholar
- Zwickl DJ, Hillis DM: Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology. 2002, 51: 588-598. 10.1080/10635150290102339.PubMedView ArticleGoogle Scholar
- Kim I, Cha SY, Yoon MH, Hwang JS, Lee SM, Sohn HD, Jin BR: The complete nucleotide sequence and gene organization of the mitochondrial genome of the oriental mole cricket, Gryllotalpa orientalis (Orthoptera: Gryllotalpidae). Gene. 2005, 353: 155-168. 10.1016/j.gene.2005.04.019.PubMedView ArticleGoogle Scholar
- Graur D, Li WH: Fundamentals of Molecular Evolution. 1999, Mass: Sinauer Associates, SunderlandGoogle Scholar
- Saccone C, De Giorgi C, Gissi C, Pesole G, Reyes A: Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system. Gene. 1999, 238: 195-209. 10.1016/S0378-1119(99)00270-X.PubMedView ArticleGoogle Scholar
- Saccone C, Gissi C, Lanave C, Larizza A, Pesole G, Reyes A: Evolution of the mitochondrial genetic system: an overview. Gene. 2000, 261: 153-159. 10.1016/S0378-1119(00)00484-4.PubMedView ArticleGoogle Scholar
- Roques S, Fox CJ, Villasana MI, Rico C: The complete mitochondrial genome of the whiting, Merlangius merlangus and the haddock, Melanogrammus aeglefinus: a detailed genomic comparison among closely related species of the Gadidae family. Gene. 2006, 383: 12-23. 10.1016/j.gene.2006.06.018.PubMedView ArticleGoogle Scholar
- Broughton RE, Milam JE, Roe BA: The complete sequence of the zebrafish (Danio rerio) mitochondrial genome and evolutionary patterns in vertebrate mitochondrial DNA. Genome Res. 2001, 11: 1958-1967.PubMedPubMed CentralGoogle Scholar
- Yukuhiro K, Sezutsu H, Itoh M, Shimizu K, Banno Y: Significant levels of sequence divergence and gene rearrangements have occurred between the mitochondrial genomes of the wild mulberry silkmoth, Bombyx mandarina, and its close relative, the domesticated silkmoth, Bombyx mori. Mol Biol Evol. 2002, 19: 1385-1389.PubMedView ArticleGoogle Scholar
- Ballard JW: Comparative genomics of mitochondrial DNA in members of the Drosophila melanogaster subgroup. J Mol Evol. 2000, 51: 48-63.PubMedGoogle Scholar
- Clary DO, Wolstenholme DR: Genes for cytochrome c oxidase subunit I, URF2, and three tRNAs in Drosophila mitochondrial DNA. Nucleic Acids Res. 1983, 11: 6859-6872. 10.1093/nar/11.19.6859.PubMedPubMed CentralView ArticleGoogle Scholar
- de Bruijn MH: Drosophila melanogaster mitochondrial DNA, a novel organization and genetic code. Nature. 1983, 304: 234-241. 10.1038/304234a0.PubMedView ArticleGoogle Scholar
- Ojala D, Montoya J, Attardi G: tRNA punctuation model of RNA processing in human mitochondria. Nature. 1981, 290: 470-474. 10.1038/290470a0.PubMedView ArticleGoogle Scholar
- Boore JL: The complete sequence of the mitochondrial genome of Nautilus macromphalus (Mollusca: Cephalopoda). BMC Genomics. 2006, 7: 182-10.1186/1471-2164-7-182.PubMedPubMed CentralView ArticleGoogle Scholar
- Perna NT, Kocher TD: Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes. J Mol Evol. 1995, 41: 353-358. 10.1007/BF01215182.PubMedView ArticleGoogle Scholar
- Cameron SL, Whiting MF: Mitochondrial genomic comparisons of the subterranean termites from the Genus Reticulitermes (Insecta: Isoptera: Rhinotermitidae). Genome. 2007, 50: 188-202. 10.1139/G06-148.PubMedView ArticleGoogle Scholar
- Gibson A, Gowri-Shankar V, Higgs PG, Rattray M: A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol Biol Evol. 2005, 22: 251-264. 10.1093/molbev/msi012.PubMedView ArticleGoogle Scholar
- Xia X: Maximizing transcription efficiency causes codon usage bias. Genetics. 1996, 144: 1309-1320.PubMedPubMed CentralGoogle Scholar
- Lynch M: Mutation accumulation in nuclear, organelle, and prokaryotic transfer RNA genes. Mol Biol Evol. 1997, 14: 914-925.PubMedView ArticleGoogle Scholar
- Adams KL, Palmer JD: Evolution of mitochondrial gene content: gene loss and transfer to the nucleus. Mol Phylogenet Evol. 2003, 29: 380-395. 10.1016/S1055-7903(03)00194-5.PubMedView ArticleGoogle Scholar
- Podsiadlowski L, Braband A: The complete mitochondrial genome of the sea spider Nymphon gracile (Arthropoda: Pycnogonida). BMC Genomics. 2006, 7: 284-10.1186/1471-2164-7-284.PubMedPubMed CentralView ArticleGoogle Scholar
- Lavrov DV, Brown WM, Boore JL: A novel type of RNA editing occurs in the mitochondrial tRNAs of the centipede Lithobius forficatus. Proc Natl Acad Sci USA. 2000, 97: 13738-13742. 10.1073/pnas.250402997.PubMedPubMed CentralView ArticleGoogle Scholar
- Boore JL, Brown WM: Mitochondrial genomes of Galathealinum, Helobdella, and Platynereis: sequence and gene arrangement comparisons indicate that Pogonophora is not a phylum and Annelida and Arthropoda are not sister taxa. Mol Biol Evol. 2000, 17: 87-106.PubMedView ArticleGoogle Scholar
- Wang X, Lavrov DV: Mitochondrial genome of thehomoscleromorph Oscarella carmela (Porifera, Demospongiae) reveals unexpected complexity in the common ancestor of sponges and other animals. Mol Biol Evol. 2007, 24: 363-373. 10.1093/molbev/msl167.PubMedView ArticleGoogle Scholar
- Cook CE: The complete mitochondrial genome of the stomatopod crustacean Squilla mantis. BMC Genomics. 2005, 6: 105-10.1186/1471-2164-6-105.PubMedPubMed CentralView ArticleGoogle Scholar
- Reineke A, Karlovsky P, Zebitz CP: Preparation and purification of DNA from insects for AFLP analysis. Insect Mol Biol. 1998, 7: 95-99. 10.1046/j.1365-2583.1998.71048.x.PubMedView ArticleGoogle Scholar
- Simon C, Frati F, Beckenbach A, Cresp B, Liu H, Flook P: Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Ann Entomol Soc Am. 1994, 87: 651-701.View ArticleGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Acids Symp Ser. 1999, 41: 95-98.Google Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.PubMedPubMed CentralView ArticleGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S: Phyltest: phylogeny hypothesis testing software. 1996, Pennsylvania: The Pennsylvania State UniversityGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.PubMedView ArticleGoogle Scholar
- Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.PubMedView ArticleGoogle Scholar
- Xia X: Data analysis in molecular biology and evolution. 2000, Boston: Kluwer Academic publishersGoogle Scholar
- Xia X, Xie Z: DAMBE: software package for data analysis in molecular biology and evolution. J Hered. 2001, 92: 371-373. 10.1093/jhered/92.4.371.PubMedView ArticleGoogle Scholar
- Posada D, Crandall KA: Modeltest: testing the model of DNA substitution. Bioinformatics. 1998, 14: 814-818. 10.1093/bioinformatics/14.9.817.View ArticleGoogle Scholar
- Swofford DL: PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods).Version 4. Sinauer Assocates, Sunderland, MA
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.PubMedView ArticleGoogle Scholar