Recovery of the mitochondrial COI barcode region in diverse Hexapoda through tRNA-based primers
© Park et al; licensee BioMed Central Ltd. 2010
Received: 26 March 2010
Accepted: 9 July 2010
Published: 9 July 2010
DNA barcoding uses a 650 bp segment of the mitochondrial cytochrome c oxidase I (COI) gene as the basis for an identification system for members of the animal kingdom and some other groups of eukaryotes. PCR amplification of the barcode region is a key step in the analytical chain, but it sometimes fails because of a lack of homology between the standard primer sets and target DNA.
Two forward PCR primers were developed following analysis of all known arthropod mitochondrial genome arrangements and sequence alignment of the tRNA-W gene which was usually located within 200 bp upstream of the COI gene. These two primers were combined with a standard reverse primer (LepR1) to produce a cocktail which generated a barcode amplicon from 125 of 141 species that included representatives of 121 different families of Hexapoda. High quality sequences were recovered from 79% of the species including groups, such as scale insects, that invariably fail to amplify with standard primers.
A cocktail of two tRNA-W forward primers coupled with a standard reverse primer amplifies COI for most hexapods, allowing characterization of the standard barcode primer binding region in COI 5' as well as the barcode segment. The current results show that primers designed to bind to highly conserved gene regions upstream of COI will aid the amplification of this gene region in species where standard primers fail and provide valuable information to design a primer for problem groups.
Since 2003, substantial effort has been directed toward the development of a DNA-based identification system for animal life, based upon the analysis of sequence diversity in the 5' region of the mitochondrial gene, cytochrome c oxidase 1 [1, 2]. Termed DNA barcoding, this approach relies upon the PCR amplification of the target gene region and its subsequent sequence characterization. Performance tests on numerous animal groups have established that the system ordinarily works well - sequence diversity in the 5' region of COI enables discrimination of more than 98% of animal species [3, 4]. However, past work has also revealed that standard primer cocktails fail to generate a PCR amplicon in certain taxonomic groups. In some cases, amplification success has been so low that researchers have suggested the need to study slower evolving gene regions that can be recovered more easily [5, 6]. Because such movement away from COI represents a serious compromise from the standardization that is fundamental to DNA barcoding, there is much incentive to develop new primer sets that enable recovery of the standard barcode region for 'problem' taxonomic groups.
The Arthropoda represent, by far, the most diverse of animal phyla. Although current primer sets generally perform well, there are some groups where barcode recovery has proven difficult. For example, barcode recoveries in the scale insects (Hemiptera, superfamily Coccoidea) are so low  that it has been suggested that an identification system for this group must be developed using another gene. Past efforts to overcome this problem have tried to design primer sets that bind within the COI gene. In the present study, we adopt an alternate approach, one in which the search for primer sites is focused on the tRNA genes that lie upstream of the COI gene in most arthropod mitochondrial genomes. Because these genes have several highly conserved sequence blocks, they have been used as an attractive target for primer design [8, 9]. However, there are two major complications that might impede such usage. First, there are a lot of sequence diversities in the conserved blocks of tRNA genes and only short sequences are actually conserved. So, previously developed primers often failed amplify target gene. Another one is the tRNA gene arrangement and its orientation. Individual mitochondrial genes occasionally move from one position in the mitochondrial genome to another. And also their orientation varies from forward to reverse. These difficulties lead to disruption of the design of primers with broad effectiveness. In this study, we strive to identify a tRNA gene whose position and orientation are relatively stable and which represent universality than previously developed primer. Applications of developed primer for COI barcoding the Hexapoda were also discussed.
Results and Discussion
Mitochondrial genome analysis, primer design and efficiency test
Mitochondrial gene arrangements upstream of the COI gene
Mitochondrial gene arrangementa
W,-C,-Yc,d,e; W,-C,-Y,-Y; W,-C; W,-Y; W
W,-C,-Y; W,-Q,-C (-Y); W,Y; W
W,-C,-Y; W,-Y; W,-C; W
W,-C; W, W,-Y; W,C,-Y
DNA barcoding analysis
From the 125 successful PCR amplicons, 111 clean sequences were recovered. The presence of shorter, non-specific amplicons was occasionally noted with the tRNA-W primer set, accounting for some failures in sequence recovery. The total length of the amplicons varied from 730 bp -870 bp. The first 70-200 bp of each read consisted of the sequences for one or more tRNA genes which were originated from different gene arrangement were excised during sequence editing. The remainder of most sequences (107) provided enough coverage for the COI gene to gain formal barcode status (512-711 bp), but four sequences were shorter (279-497 bp). All sequences included 29 bp- 59 bp of sequence information for the far 5' end of the COI gene that was not recovered with the standard primer set, reflecting the sequence region from the presumed initiation codon to the 3' end of the LepF1 primer binding region.
In mitochondria, six kinds of codon, ATN, TTG and GTG have been reported as a canonical translation initiation codon in vertebrate and insect. However, there are several exceptions to this rule, especially in COI gene. Previous reports demonstrated that quadruplets, such as ATAA, TTAA, TTAG, and ATTA could be used as a initiation signal [12–14] although these days, TCG or CGA has more convincing evidence than quadruplets, especially in Diptera  and Lepidoptera , respectively. In this study, 70 of the 111 sequences began with one of six known arthropod initiation codons (58 ATN, 14 TTG and 1 GTG) and predicted 22 cases, mostly found in Diptera (19/22) with two cases from a Orthoptera (1/1) and a Coleoptera (1/19), using TCG (including a CCG) codon (Additional file 2). Another possible initiator, CGA codon, were found in 17 species distributed in Lepidoptera (9/12), Ephemeroptera (2/5), Diptera (1/22), Mantodea (1/1), Trichoptera (1/5), and a Plecoptera (1/6). These are plausible because a TAG or a TAA stop codon presents at the beginning region of the COI gene and no canonical initiator was found within 30 bp downstream from the stop codon. For two other sequences, the initiation codon was uncertain because sequencing results lacked bidirectional coverage at the 5' end because of presence of short non-specific amplicon which open found at the PCR reaction that used tailed primers.
Two additional samples, deriving from species of the wasp family Ichneumonidae (Tryphoninae sp.) and the collembolan family Isotomidae (Folsomina sp.), each possessing three T→C substitutions near the 3' end of the primer binding region, represent the only barcode records so far reported for these lineages. The nearest homology group to the species of Tryphoniinae had just 87.6% similarity, while the nearest sequence to the species of Folsomina showed just 82.4% congruence.
The alignment of all 111 sequences revealed five cases of amino acid deletion. A block of three amino acids were deleted in three species of Pseudococcidae (Figure 4b). This same deletion occurs in all members of scale insect families tested in this study representing the first case of a three amino acid deletion in COI across all Hexapoda that have been analyzed. Interestingly, there was a single amino acid deletion in a species belonging to the hymenopteran family, Tenthredinidae, at a similar position. A two amino acid deletion was found in a species of Pompilidae (Hymenoptera), but it occurred at a different position in the standard COI barcode region (amino acid position 175 from the initiation codon versus 131/133 amino acids for the scale insects and the tenthrendinid).
As already noted, the new primer cocktail failed to amplify the barcode region in some Hexapoda. Its failure in these cases was likely due to shifts in position of the tRNA-W gene, the existence of intergenic gap between tRNA-W and COI or to sequence variation in the segment of tRNA-W targeted for primer binding. However, this cocktail primer can be a useful supplementary tool for standard COI barcoding method. For example, we could attain 94% of successful barcode from over a thousand of Hemiptera specimens which represents over 200 species with only two PCR amplification steps; the first run, performed with standard primer, obtained only 82% successes. Then the second run conducted with cocktail primer against samples failed at first run (unpublished data). Additionally, this cocktail primer can reduce barcoding failures caused by unexpected amplification of endosymbiont COI where occasionally found in barcoding the Hemiptera.
The tRNA-W primer cocktail developed in this study successfully amplified the DNA barcode region for most hexapods including many species which failed to generate an amplicon with the standard primer set. Use of this cocktail not only improves the success of barcode recovery, but also provides the information needed to design a primer set for problem groups. The current standard primer set for hexapods are not degenerate, so further study of the upstream sequence of COI promises to aid the development of a new primer cocktail with higher generality.
Mitochondrial genome analysis and primer design
All mitochondrial genome arrangements were assessed using the gene arrangement comparison tool on Mitome, the Mitochondrial Genome Database http://www.mitome.info. tRNA-W gene sequences were retrieved from GenBank and aligned by Clustal W . Some manual modification was done to divide two distinct groups of tRNA-W with high internal homogeneity. Two forward PCR primers: tRWF1 (5'-AAACTAATARCCTTCAAAG-3') and tRWF2 (5'-AAACTAATAATYTTCAAAATTA-3') with M13 tails (5'-TGTAAAACGACGGCCAGT-3') on their 5' end were designed to best represent these two groups and were combined with a standard reverse primer (LepR1) to produce a cocktail (1:1 ratio) which was tested against 141 species representing 121 different families of Hexapoda. A new forward primer for scale insects, PcoF1: 5'- CCTTCAACTAATCATAAAAATATYAG - 3', was designed reflecting sequence diversity in scale insects at the same positions for universal COI barcoding primers.
Most of the specimens used in this study derived from collections made by BIO researchers at sites in North America over the past two years. Six scale insect samples were provided from the Central Post-entry Quarantine Station in South Korea. Jointly, these collections included 141 species representing 121 different families of Hexapoda (Additional file 4). All DNA was extracted from either dried or ethanol-fixed leg samples using a standard Glass Fibre extraction protocol  except for some small specimens which were processed as whole individuals. PCR thermocycling was done under the following conditions: 2 min at 95°C; 5 cycles of 40 sec at 94°C, 40 sec at 45°C, 70 sec at 72°C; 40 cycles of 40 sec at 94°C, 40 sec at 51°C, 70 sec at 72°C; 5 min at 72°C; held at 4°C.
PCR, PCR check and DNA sequencing were carried out using standard methods. Contigs were assembled using CodonCode aligner Ver2.0.6 (CodonCode Co.) and were subsequently aligned by the same software. The locations of the deletions were identified by manual editing. All sequences have been deposited in GenBank and accession numbers (GU013562 ~ GU013672 and GU936932 ~ GU936957), for the sequences, as well as specimen and collection data, and trace files are available within the DIMC project files in BOLD http://www.barcodinglife.org.
We thank Jayme Sones, David Porco and Xin Zhou for providing specimens or DNA extracts. We are also grateful to Justin Schonfeld and Robert Ward for comments on an earlier draft of this manuscript. This research was supported by a grant (FDM0401012) from the National Plant Quarantine Service to D-S Park and by grants from Genome Canada through the Ontario Genomics Institute to PDNH.
- Hebert PDN, Cywinska A, Ball SL, deWaard JR: Biological identifications through DNA barcodes. Proc Biol Sci. 2003, 270 (1512): 313-321. 10.1098/rspb.2002.2218.PubMed CentralPubMedView ArticleGoogle Scholar
- Hebert PDN, Gregory TR: The promise of DNA barcoding for taxonomy. Syst Biol. 2005, 54 (5): 852-859. 10.1080/10635150500354886.PubMedView ArticleGoogle Scholar
- Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN: DNA barcodes distinguish species of tropical Lepidoptera. Proc Natl Acad Sci USA. 2006, 103 (4): 968-971. 10.1073/pnas.0510466103.PubMed CentralPubMedView ArticleGoogle Scholar
- Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PDN: DNA barcoding Australia's fish species. Philos Trans R Soc Lond B Biol Sci. 2005, 360 (1462): 1847-1857. 10.1098/rstb.2005.1716.PubMed CentralPubMedView ArticleGoogle Scholar
- Sevilla RG, Diez A, Noren M, Mouchel O, Jerome M, Verrez-Bagnis V, van Pelt H, Favre-Krey L, Bautista JM: Primers and polymerase chain reaction conditions for DNA barcoding teleost fish based on the mitochondrial cytochrome b and nuclear rhodopsin genes. Mol Ecol Notes. 2007, 7: 730-734. 10.1111/j.1471-8286.2007.01863.x.View ArticleGoogle Scholar
- Smith MA, Wood DM, Janzen DH, Hallwachs W, Hebert PDN: DNA barcodes affirm that 16 species of apparently generalist tropical parasitoid flies (Diptera, Tachinidae) are not all generalists. Proc Natl Acad Sci USA. 2007, 104 (12): 4967-4972. 10.1073/pnas.0700050104.PubMed CentralPubMedView ArticleGoogle Scholar
- Kondo T, Gullan PJ, Williams DJ: Coccidology. The study of scale insects (Hemiptera: Sternorrhyncha: Coccoidea). Revista Corpoica - Ciencia y Technologia Agropecuaria. 2008, 9 (2): 55-61.Google Scholar
- Lewis RL, Beckenbach AT, Mooers AO: The phylogeny of the subgroups within the melanogaster species group: likelihood tests on COI and COII sequences and a Bayesian estimate of phylogeny. Mol Phylogenet Evol. 2005, 37 (1): 15-24. 10.1016/j.ympev.2005.02.018.PubMedView ArticleGoogle Scholar
- Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT: Incorporating molecular evolution into phylogenetic analysis, and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Annu Rev Ecol Evol. 2006, 37: 545-579. 10.1146/annurev.ecolsys.37.091305.110018.View ArticleGoogle Scholar
- Lee YS, Oh J, Kim YU, Kim N, Yang S, Hwang UW: Mitome: dynamic and interactive database for comparative mitochondrial genomics in metazoan animals. Nucleic acids research. 2008, D938-942. 36 DatabaseGoogle Scholar
- Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R: DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol. 1994, 3 (5): 294-299.PubMedGoogle Scholar
- Clary DO, Wolstenholme DR: Genes for cytochrome c oxidase subunit I, URF2, and three tRNAs in Drosophila mitochondrial DNA. Nucleic acids research. 1983, 11 (19): 6859-6872. 10.1093/nar/11.19.6859.PubMed CentralPubMedView ArticleGoogle Scholar
- de Bruijn MH: Drosophila melanogaster mitochondrial DNA, a novel organization and genetic code. Nature. 1983, 304 (5923): 234-241. 10.1038/304234a0.PubMedView ArticleGoogle Scholar
- Lunt DH, Zhang DX, Szymura JM, Hewitt GM: The insect cytochrome oxidase I gene: evolutionary patterns and conserved primers for phylogenetic studies. Insect Mol Biol. 1996, 5 (3): 153-165. 10.1111/j.1365-2583.1996.tb00049.x.PubMedView ArticleGoogle Scholar
- Krzywinski J, Grushko OG, Besansky NJ: Analysis of the complete mitochondrial DNA from Anopheles funestus: an improved dipteran mitochondrial genome annotation and a temporal dimension of mosquito evolution. Mol Phylogenet Evol. 2006, 39 (2): 417-423. 10.1016/j.ympev.2006.01.006.PubMedView ArticleGoogle Scholar
- Kim MI, Baek JY, Kim MJ, Jeong HC, Kim KG, Bae CH, Han YS, Jin BR, Kim I: Complete nucleotide sequence and organization of the mitogenome of the red-spotted apollo butterfly, Parnassius bremeri (Lepidoptera: Papilionidae) and comparison with other lepidopteran insects. Mol Cells. 2009, 28 (4): 347-363. 10.1007/s10059-009-0129-5.PubMedView ArticleGoogle Scholar
- Pereira SL, Baker AJ: Low number of mitochondrial pseudogenes in the chicken (Gallus gallus) nuclear genome: implications for molecular inference of population history and phylogenetics. BMC Evol Biol. 2004, 4: 17-10.1186/1471-2148-4-17.PubMed CentralPubMedView ArticleGoogle Scholar
- Song H, Buhay JE, Whiting MF, Crandall KA: Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified. Proc Natl Acad Sci USA. 2008, 105 (36): 13486-13491. 10.1073/pnas.0803076105.PubMed CentralPubMedView ArticleGoogle Scholar
- Szafranski P: The mitochondrial trn-cox1 locus: rapid evolution in Pompilidae and evidence of bias in cox1 initiation and termination codon usage. Mitochondrial DNA. 2009, 20 (1): 15-25.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralPubMedView ArticleGoogle Scholar
- Ivanova NV, deWaard JR, Hebert PDN: An inexpensive, automation-friendly protocol for recovering high-quality DNA. Molecular Ecology Notes. 2006, doi:10.1111/j.1471-8286.2006.01428.xGoogle Scholar
- Fisher BL, Smith MA: A revision of Malagasy species of Anochetus mayr and Odontomachus latreille (Hymenoptera: Formicidae). PLoS One. 2008, 3 (5): e1787-10.1371/journal.pone.0001787.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.