Bioinformatics analysis
Metazoan COI sequences bearing the "BARCODE" keyword were downloaded from GenBank using the NCBI eFetch tool. Barcodes that were less than 650 bases in length were eliminated, leaving a dataset of 6,695 barcode sequences from 1,587 species. For various sizes of 5'-end minibarcodes, ranging in size from 10 bases up to the full-length of the barcode sequence, we analyzed the number of species that could be uniquely identified (to the exclusion of other species) using that sequence.
Specimens and their taxonomic coverage
All DNA extracts were obtained from different barcoding projects in the Canadian Centre for DNA Barcoding and external collaborators. We selected these samples considering maximum taxonomic range.
Primer design strategy
We selected the 5' end of the barcode region targeting a 100–150 base amplicon. By comparing a wide range of taxa in this region, we found well-conserved strings of amino acids across all taxa in priming sites. Interestingly, this high level of conservation is also evident at nucleotide level. We designed multiple oligos by using the Primer3 program [10] and considering physical and structural properties of oligos (such as annealing temperature, G+C percentage, and self-complementarity). We selected the primer Uni-MinibarR1: 5'-GAAAATCATAATGAAGGCATGAGC-3' for further testing as it represented the highest similarity – especially at the 3' end – to other taxa. A similar strategy was used for designing a forward primer: Uni-MinibarF1: 5'-TCCACTAATCACAARGATATTGGTAC-3'. This primer is positioned in the same region as other common barcoding primers are located. We attached M13 forward and reverse tails to our forward and reverse primers, respectively, to facilitate the high throughput sequencing process. These tails did not decrease the PCR success.
PCR Optimization Strategy
PCR reactions were performed using a standard PCR pre-mix [11]. We used the above mentioned universal primer set in all the reactions in a touch up PCR program: 95°C for 2 min, followed by 5 cycles of 95°C-1 min, 46°C-1 min, and 72°C-30 sec, followed by 35 cycles of 95°C-1 min, 53°C-1 min, and 72°C-30 sec, and finally a final extension at 72°C for 5 min. We used a Mastercycler ep gradient S (Eppendorf, Mississauga, ON, Canada) thermalcycler. We included two negative control reactions (no DNA template) in all our PCR 96-well plates. To compare the universal mini-barcode primer set with the specific full-length primers we amplified DNA extracts using taxonomically specific primer sets (i.e. 2 primer sets for fish species) [11].
PCR amplification verification and sequencing
PCR products were visualized on a 2% E-gel® 96 Agarose (Invitrogen, Burlington, ON, Canada). The bands on E-gel were used as a measure of PCR success. To verify the amplification of the target region, we sequenced 747 PCR products from at least 363 species. Standard BigDye kits (Applied Biosystems, Foster City, CA) were used for sequencing. Sequencing reactions were cleaned up by using Agencourt® CleanSEQ® kit (Agencourt Bioscience Corporation, Beverly, MA). The sequences were run bidirectionally on a 3730xl DNA analyzer (Applied Biosystems, Foster City, CA, USA), edited with Sequencher™ (Gene Codes Corporation, Ann Arbor, MI), and aligned using BioEdit version 7.0.5.3.