A universal DNA mini-barcode for biodiversity analysis
© Meusnier et al; licensee BioMed Central Ltd. 2008
Received: 19 December 2007
Accepted: 12 May 2008
Published: 12 May 2008
The goal of DNA barcoding is to develop a species-specific sequence library for all eukaryotes. A 650 bp fragment of the cytochrome c oxidase 1 (CO1) gene has been used successfully for species-level identification in several animal groups. It may be difficult in practice, however, to retrieve a 650 bp fragment from archival specimens, (because of DNA degradation) or from environmental samples (where universal primers are needed).
We used a bioinformatics analysis using all CO1 barcode sequences from GenBank and calculated the probability of having species-specific barcodes for varied size fragments. This analysis established the potential of much smaller fragments, mini-barcodes, for identifying unknown specimens. We then developed a universal primer set for the amplification of mini-barcodes. We further successfully tested the utility of this primer set on a comprehensive set of taxa from all major eukaryotic groups as well as archival specimens.
In this study we address the important issue of minimum amount of sequence information required for identifying species in DNA barcoding. We establish a novel approach based on a much shorter barcode sequence and demonstrate its effectiveness in archival specimens. This approach will significantly broaden the application of DNA barcoding in biodiversity studies.
DNA barcoding seeks to develop a comprehensive species-specific sequence library for all eukaryotes . The 650 bp mitochondrial cytochrome c oxidase 1 (CO1, cox1) DNA barcode  is easily sequenced and provides greater than 97% species-level specificity for birds , mammals , fishes , and various arthropods . However, conventional DNA barcoding encounters two problems. First, DNA degradation in archival specimens and processed biological material (i.e. food products) often prevents the recovery of PCR fragments longer than 200 bp, impeding barcode recovery [7–9]. Second, current approaches cannot be used for comprehensive analysis of environmental samples because high sequence variability necessitates the use of distinct primer sets for each major taxonomic group. In this study, we propose the use of a "mini-barcode" sequence to overcome these problems. We begin by identifying the minimum amount of sequence information required for accurate species identification. We then test the gain in amplification success for smaller fragments in specimens with degraded DNA. Finally, by targeting conserved priming sites within the barcode region we develop primers with the universality required for the analysis of all major eukaryotes.
Results and Discussion
The mini-barcode system dramatically broadens the applications of DNA barcoding. We have now demonstrated that sequence information can be reliably obtained from archival specimens or those with degraded DNA. Further, the universality of the primers enables the recovery of comprehensive barcode information from environmental mixtures. Finally, the short universally-primed amplicon is ideal for sequence characterization through new parallelized high-throughput sequencing technologies, allowing inexpensive but comprehensive studies of biodiversity to be a realistic goal.
Metazoan COI sequences bearing the "BARCODE" keyword were downloaded from GenBank using the NCBI eFetch tool. Barcodes that were less than 650 bases in length were eliminated, leaving a dataset of 6,695 barcode sequences from 1,587 species. For various sizes of 5'-end minibarcodes, ranging in size from 10 bases up to the full-length of the barcode sequence, we analyzed the number of species that could be uniquely identified (to the exclusion of other species) using that sequence.
Specimens and their taxonomic coverage
All DNA extracts were obtained from different barcoding projects in the Canadian Centre for DNA Barcoding and external collaborators. We selected these samples considering maximum taxonomic range.
Primer design strategy
We selected the 5' end of the barcode region targeting a 100–150 base amplicon. By comparing a wide range of taxa in this region, we found well-conserved strings of amino acids across all taxa in priming sites. Interestingly, this high level of conservation is also evident at nucleotide level. We designed multiple oligos by using the Primer3 program  and considering physical and structural properties of oligos (such as annealing temperature, G+C percentage, and self-complementarity). We selected the primer Uni-MinibarR1: 5'-GAAAATCATAATGAAGGCATGAGC-3' for further testing as it represented the highest similarity – especially at the 3' end – to other taxa. A similar strategy was used for designing a forward primer: Uni-MinibarF1: 5'-TCCACTAATCACAARGATATTGGTAC-3'. This primer is positioned in the same region as other common barcoding primers are located. We attached M13 forward and reverse tails to our forward and reverse primers, respectively, to facilitate the high throughput sequencing process. These tails did not decrease the PCR success.
PCR Optimization Strategy
PCR reactions were performed using a standard PCR pre-mix . We used the above mentioned universal primer set in all the reactions in a touch up PCR program: 95°C for 2 min, followed by 5 cycles of 95°C-1 min, 46°C-1 min, and 72°C-30 sec, followed by 35 cycles of 95°C-1 min, 53°C-1 min, and 72°C-30 sec, and finally a final extension at 72°C for 5 min. We used a Mastercycler ep gradient S (Eppendorf, Mississauga, ON, Canada) thermalcycler. We included two negative control reactions (no DNA template) in all our PCR 96-well plates. To compare the universal mini-barcode primer set with the specific full-length primers we amplified DNA extracts using taxonomically specific primer sets (i.e. 2 primer sets for fish species) .
PCR amplification verification and sequencing
PCR products were visualized on a 2% E-gel® 96 Agarose (Invitrogen, Burlington, ON, Canada). The bands on E-gel were used as a measure of PCR success. To verify the amplification of the target region, we sequenced 747 PCR products from at least 363 species. Standard BigDye kits (Applied Biosystems, Foster City, CA) were used for sequencing. Sequencing reactions were cleaned up by using Agencourt® CleanSEQ® kit (Agencourt Bioscience Corporation, Beverly, MA). The sequences were run bidirectionally on a 3730xl DNA analyzer (Applied Biosystems, Foster City, CA, USA), edited with Sequencher™ (Gene Codes Corporation, Ann Arbor, MI), and aligned using BioEdit version 220.127.116.11.
We thank Elizabeth Clare, Robin Floyd, Kevin Kerr, Natalia Ivanova, Gary Saunders, Alex Smith, Magali Sole, Dirk Steinke, and Xin Zhou for providing DNA extracts. This work is supported by grants from Genome Canada (through the Ontario Genomics Institute) and Natural Sciences and Engineering Research Council of Canada to the Canadian Barcode of Life Network.
- Marshall E: Taxonomy. Will DNA bar codes breathe life into classification?. Science. 2005, 307 (5712): 1037-10.1126/science.307.5712.1037.PubMedView ArticleGoogle Scholar
- Hebert PD, Cywinska A, Ball SL, deWaard JR: Biological identifications through DNA barcodes. Proc Biol Sci. 2003, 270 (1512): 313-321. 10.1098/rspb.2002.2218.PubMedPubMed CentralView ArticleGoogle Scholar
- Hebert PD, Stoeckle MY, Zemlak TS, Francis CM: Identification of birds through DNA barcodes. PLoS Biology. 2004, 2 (10): E312-10.1371/journal.pbio.0020312.PubMedPubMed CentralView ArticleGoogle Scholar
- Hajibabaei M, Singer GA, Clare EL, Hebert PDN: Design and applicability of DNA arrays and DNA barcodes in biodiversity monitoring. BMC Biol. 2007, 5: 24-10.1186/1741-7007-5-24.PubMedPubMed CentralView ArticleGoogle Scholar
- Ward RD, Zemlak TS, Innes BH, Last PR, Hebert PD: DNA barcoding Australia's fish species. Philos Trans R Soc Lond B Biol Sci. 2005, 360 (1462): 1847-1857. 10.1098/rstb.2005.1716.PubMedPubMed CentralView ArticleGoogle Scholar
- Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN: DNA barcodes distinguish species of tropical Lepidoptera. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (4): 968-971. 10.1073/pnas.0510466103.PubMedPubMed CentralView ArticleGoogle Scholar
- Goldstein PZ, Desalle R: Calibrating phylogenetic species formation in a threatened insect using DNA from historical specimens. Molecular Ecology. 2003, 12 (7): 1993-1998. 10.1046/j.1365-294X.2003.01860.x.PubMedView ArticleGoogle Scholar
- Hajibabaei M, Smith MA, Janzen DH, Rodriguez JJ, Whitfield JB, Hebert PDN: A minimalist barcode can identify a specimen whose DNA is degraded. Molecular Ecology Notes. 2006, 6: 959-964. 10.1111/j.1471-8286.2006.01470.x.View ArticleGoogle Scholar
- Wandeler P, Hoeck PE, Keller LF: Back to the future: museum specimens in population genetics. Trends Ecol Evol. 2007, 22 (12): 634-642. 10.1016/j.tree.2007.08.017.PubMedView ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- The Canadian Centre for DNA Barcoding. [http://www.dnabarcoding.ca/pa/ge/research/protocols/amplification]
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4 (4): 406-425.PubMedGoogle Scholar
- Kimura M: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of molecular evolution. 1980, 16 (2): 111-120. 10.1007/BF01731581.PubMedView ArticleGoogle Scholar