Technological advances in polymorphism detection and genotyping have made the single nucleotide polymorphisms (SNPs) the marker of choice for many high density genotyping studies [1, 2]. High-throughput microarrays containing assays for thousands of SNPs are becoming available for a number of non-model organisms [1–3], and being used more frequently in ecological and evolutionary studies, including population genetics studies e.g. [4–7], QTL identification e.g. , parentage determination e.g. [9–11], and mixed stock analysis e.g. [12–15].
Despite the recent technical advances, genotyping large numbers of individuals with thousands of SNPs remains prohibitively expensive for many research groups. Furthermore, many population genetic studies are based on population allele frequency rather than individual genotype data. Therefore, determination of allele frequencies from pooled DNA samples, i.e. ‘allelotyping’, has been suggested more than 30 years ago as a cost-effective alternative to individual genotyping (reviewed by Sham et al. ). Several studies have successfully used this approach in genome-wide association studies that compare the allele frequencies between cases and controls e.g. [17–23]. These studies have demonstrated satisfactory accuracy and repeatability, and the DNA pooling approach can reduce costs by as much as 100-fold depending on the number of samples [16, 21, 23].
While the allelotyping of DNA pools can substantially reduce the costs compared to individual sample by sample genotyping, this approach is not without disadvantages. First, various sources of error occur during the allele frequency estimation from DNA pools. According to Earp et al. , variation introduced to allele frequency estimates can be divided into four categories: (i) within array; (ii) between arrays; (iii) between independently constructed identical pools, and (iv) between pools constructed from different individuals of the same population (biological replicates). Therefore, in order to obtain reliable allele frequency estimates using DNA pooling it is important to evaluate the magnitude and relative importance of different sources of error [23, 24]. In addition, DNA pooling generally does not provide information about haplotype frequency and despite recent computational improvements [25, 26] resolving the phase ambiguity remains a challenge for large number of loci . However, despite the popularity of DNA pooling in genetic association studies, only few studies to date have utilized allelotyping approach to characterize inter-population variation e.g. .
Here, we tested the usefulness of DNA pooling for a first time using an Atlantic salmon (Salmo salar L.) Illumina SNP-chip to obtain accurate allele frequency estimates for multiple Atlantic salmon populations and evaluated the importance of different sources of errors arising from allelotyping. First, we assessed the effect of DNA pool construction and between-array variations on allele frequency estimates. Subsequently, the effect of cluster separation scores (parameter that summarizes the separation of three genotype classes in the theta dimension), two alternative sources of theta (a value between 0 and 1 which defines the genotype; 0 = AA, 1 = BB, 0.5 = AB) and DNA pool size on allele frequency estimation were evaluated. Finally, two alternative quality control (QC) filters were tested to select optimal sets of SNP loci for subsequent population genetic analysis.