A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity
© Jacob et al.; licensee BioMed Central Ltd. 2014
Received: 31 March 2014
Accepted: 11 August 2014
Published: 26 August 2014
Analysis of single nucleotide polymorphisms (SNPs) derived from whole-genome studies allows for rapid evaluation of genome-wide diversity, and genomic epidemiology studies of Plasmodium falciparum provide insights into parasite population structure, gene flow, drug resistance and vaccine development. In areas with adequate cold chain facilities, large volumes of leukocyte-depleted patient blood can be frozen for use in parasite genomic analyses. In more remote endemic areas smaller volumes of infected blood are taken by finger prick, and dried and stored on filter paper. These dried blood spots do not generally yield enough concentrated parasite DNA for whole-genome sequencing.
A DNA microarray was designed for use on field samples to type a genome-wide set of SNPs which prior sequencing had shown to be variable in Africa, Southeast Asia, and Papua New Guinea. An algorithm was designed to call SNPs in samples with low parasite DNA. With this new algorithm SNP-calling accuracy of 98% was measured by hybridizing purified DNA from malaria lab strains and comparing calls with SNPs called from full genome sequences. An average accuracy of >98% was likewise obtained for DNA extracted from malaria field samples collected in studies in Southeast Asia, with an average call rate of > 82%.
This new high-density microarray provided high quality SNP calls from a wide range of parasite DNA quantities, and represents a robust tool for genome-wide analysis of malaria parasites in diverse settings.
KeywordsPlasmodium falciparum Malaria Microarray
Tools for assessing genetic diversity in malaria parasites are of potential use for the discovery of novel malaria vaccine antigens , and for understanding the molecular basis of drug resistance [2–6]. Recent advances in genome sequencing technology offer high-throughput methods for obtaining full genomic data [7, 8]. However, these technologies still require relatively large quantities of high quality DNA. Microarrays are able to tolerate DNA of lower quantity and quality typical of patient-derived field samples, which contain far more human than parasite DNA. Thus, field samples unsuitable for full-genome sequencing may be amenable to microarray analysis. Microarrays can also be adapted to incorporate new loci, determine copy number polymorphism or designed to answer specific research questions.
Microarrays have been developed for genotyping Plasmodium falciparum, using a variety of platforms that detect single nucleotide polymorphisms (SNPs) at loci that had been described at the time of platform design [9–12]. Many novel SNPs were discovered in a large scale P. falciparum sequencing project, resulting in a map of population genomic variation . We sought to create a microarray that would be able to genotype highly informative loci within this genomic map in field samples that were not amenable to next-generation sequencing. We also wanted a high-throughput system capable of working with DNA extracted from low volume blood samples. We chose to use a custom NimbleGen 4.2 million probe design in multi-plex format. This platform is comprised of 12 identical arrays on one slide, each capable of genotyping 33,716 loci within the P. falciparum genome. When dual-color labeling is used, two samples can be hybridized to a single array yielding 33,716 SNPs for 24 samples in a single 2-day experiment, where several such slides can be run simultaneously, making this approach relatively high-throughput and inexpensive.
Heuristic base calling
SNP call accuracy and SNP call rate of 3D7 cultured parasites
SNP call accuracy
SNP call rate
This algorithm uses the global array average for probes with identical center positions to adjust individual probe intensities. A call is made only when the sense and antisense base match, a contrast between alleles of 0.98 exists, and all intensities are greater than the random binding threshold. This heuristic algorithm outperformed the D-score method in the mixing experiment of cultured 3D7 parasites raising the average accuracy of these samples to >98% and increasing the accuracy of the 1,000 parasites/μl by as much as 25%. We also observed an increase in SNP call rates when using this heuristic algorithm.
SNP call accuracy and SNP call rate of NF54 purified DNA
Parasite DNA (ng)
Analysis of field isolates and leukocyte depletion
We noted a difference for low volume samples that had undergone leukocyte depletion to remove human DNA prior to DNA extraction. Leukocyte depletion removes human white blood cells from a whole blood sample, enriching the final sample for red blood cells which harbor the parasite and consequently their DNA. In the analysis of cultured 3D7 parasites diluted in whole blood a subset of samples were leukocyte depleted using CF11 cellulose columns. For the 1,000 parasites/μl samples we saw an average increase of accuracy of 4.2% and an average increase in SNP call rate of 45.9% (Table 1). This increase was more pronounced in non-whole genome amplified (WGA) samples.
Effects of whole-genome amplification
Samples with low DNA quantity can be subjected to WGA to increase the amount of DNA, allowing more samples to pass preliminary QC metrics. Several cultured and purified DNA samples underwent WGA prior to genotyping and were used to evaluate possible bias when undergoing amplification. Accuracies in cultured parasites showed no marked difference (≤0.2%) while WGA slightly increased the call rate overall (Table 1). Two NF54 DNA samples were subjected to WGA prior to microarray analysis, yielding an average accuracy of 99.54% and an average call rate of 88.34%, about 10% higher than the non-WGA samples (Table 2).
We set out to develop a SNP genotyping platform for use on P. falciparum field samples of low DNA quantity and quality, and evaluated this tool using parasite DNA from cultured parasites and preserved venous blood collected from malaria-infected individuals in field studies. An earlier microarray used 250 ng of DNA as the minimum level tested, although lower DNA quantities were not systematically evaluated . Our heuristic SNP-calling algorithm was capable of accounting for a bias discovered when using lower quantity samples and yields highly accurate results for samples with as little as 37 ng of parasite DNA in field samples and 10 ng in purified DNA samples. We also obtained high accuracy with filter paper samples, albeit with lower reproducibility (not shown). Addition of a WGA step and a strict DNA quantity threshold may improve reproducibility for filter paper blood spots.
DNA microarrays have previously been used for SNP detection in P. falciparum[9–11, 15]. With the increased availability of whole genome sequence data thanks to lower costs and improved methodology, the value of DNA microarrays can be questioned. However, for valuable field samples that are associated with important demographic, clinical and other phenotypic data such as drug resistance or vaccine resistance , microarrays can rescue the high proportion of samples that fail to meet criteria for sequencing, justifying the use of microarrays. Samples that often do not meet sequencing standards include low volume blood samples, filter paper blood spots, samples with high levels of human DNA contamination, and archived or degraded samples. It was our goal to create a SNP microarray that can complement genome sequence data in association and population genetics studies, providing genome-wide SNP data for samples not fit for sequencing. The SNPs chosen were based on a set of highly validated loci identified by sequencing hundreds of global isolates of P. falciparum. This large sequencing project shows how microarrays still have a place in genomic research. The first iteration of samples prepared for sequencing had a QC pass rate of ~30%; more recent samples had a pass rate of >50%. This leaves a significant number of samples with no data. By selecting SNPs from within this highly validated list of loci we can fill in the gaps from sequencing while continuing to use samples for which sequencing was successful. The cost of genome-wide microarray analysis (currently less than $90/sample excluding personnel costs) compares favorably with that of genome sequencing (at least 3-fold higher for short-read Illumina sequencing and approximately 25-fold higher for third generation sequencing and assembly). Imputation strategies are being developed that may further increase the value of microarray-generated SNP information.
This microarray genotypes over 33,000 variable positions in the P. falciparum genome with high accuracy and high throughput. The ability to run samples with as little as 10 ng of parasite DNA increases the number of field samples for which whole genome analysis is possible. The selection of loci also allows samples genotyped on this microarray to be analyzed in conjunction with higher quality samples sequenced using next-generation sequencing platforms.
SNP selection and chip design
A pooled set of all possible SNPs was gathered from Version 1.0 of the MalariaGEN Community Project  and a SNP set from an Affymetrix array used in a previous study . This pooled set was filtered based on the proportion of missing data in previous data sets, hyper-heterozygosity, and minor allele-frequency (MAF) in Southeast Asia and Africa ≥1%. SNPs were prioritized by their MAF with precedence given to those with higher MAF values. SNPs given the highest priority were those that were variable in multiple populations. SNPs were ranked in order of priority and then a final filter of minimum inter-SNP distance was applied. To decrease the number of large genomic gaps lower priority SNPs were given preference to high priority SNPs that were within 22 bp of another high priority SNP. The 22 bp value was determined through systematically increasing the inter-SNP distance threshold and measuring the number of large genomic gaps and total number of loci.
The probes designed for the array were mostly low GC content (Figure 4C) and of variable lengths (29–41 bp) (Figure 4D). Variable length probes provide higher average intensities when compared to static length probes and low GC was due to the genomic AT richness of the P. falciparum genome . Standard NimbleGen probes were also placed on the array, including alignment probes, chip identification probes, and random cDNA probes designed to match the content of our user designed probes. The average intensity of 8,421 random probes was used to define the global array background.
The NimbleGen 4.2 M Probe Custom DNA Microarray comes in varying formats from which we selected the highest number of plexes (12) to increase the number of samples for use in high-throughput fields studies. Probe length was determined by cDNA melting temperature of the sequence surrounding each SNP. For every SNP there were eight probes, four in the sense direction with each base at the center position and four in the anti-sense direction. Probe quartets were arranged sequential on the array to avoid variation due to chip defects and spatial bias. This array is no longer available as NimbleGen has discontinued custom microarray production. Steps are being taken to transfer this probe set and analysis pipeline to a new commercial platform.
DNA labeling and hybridization
Sample DNA was concentrated using vacuum centrifugation to a volume of 30–50 μl and heat denatured with 1 OD of 65% random nonamers labeled with cy3 or cy5 for 10 minutes at 98°C. Denatured DNA was chilled on ice for 2 minutes and then incubated for 2 hours at 37°C in the presence of 50 units of Klenow fragment and a 50X dNTP mixture. The reaction was terminated with 0.5 M EDTA and DNA were precipitated with 5 M NaCl and iso-propanol. Labeled DNA was washed 2–3 times with 80% ice cold ethanol to remove unincorporated dye. After removal of ethanol, samples were rehydrated in water and cy3 and cy5 labeled samples were combined for multiplexing. Samples were vacuum-concentrated and resuspended in a buffer containing NimbleGen alignment oligo and 1X Denhardt’s solution. The samples were heat denatured at 95°C for 5 minutes and stabilized at 42°C prior to loading onto the array. Samples were hybridized on a NimbleGen hybridization station for 16–24 hours at 42°C. Slides were disassembled in a dish containing Wash Buffer 1 at 42°C and washed in Wash Buffer 1, Wash Buffer 2, and Wash Buffer 3 for 2 minutes, 1 minute, and 15 seconds respectively. Slides were washed and subsequently dried in the SlideWasher 12 Array Processing System. Microarrays were scanned with a NimbleGen MS 200 Microarray Scanner at 2 μm using “autogain” to automatically adjust scanning parameters on an individual array basis.
Heuristic base calling algorithm
Each SNP typed by this array is at a bi-allelic locus, as determined from extensive sequencing of global P. falciparum isolates . The heuristic algorithm therefore focuses on the two intensities of each possible allele. This algorithm then identifies the global mean intensity for every probe with the identical center base and adjusts each individual intensity by the difference in intensity between the global means of the two bases being interrogated. Intensities are evaluated and adjusted independently for the sense and anti-sense direction. After adjusting the intensity, a SNP is called after fulfilling the following criteria: 1) the contrast of intensities is greater than or equal to 0.98, 2) the forward and reverse SNP calls are concordant, and 3) all intensities are above global background levels determined by the average value of the random probes. Multiple thresholds were tested for background levels up to average random plus two standard deviations (data not shown). SNP calling accuracy changed minimally when thresholds were raised, however SNP call rate dropped more significantly. This algorithm was written in the PERL programming language and uses standard outputs from the Roche NimbleScan (v2.6) software. Given the discontinuation of this microarray the algorithm will be made publically available when the probe set is validated on a new platform.
P. falciparum (3D7 strain) was grown in culture under standard conditions from stocks procured from the MR4 repository. Base parasitemia was determined by light microscopy and dilutions were made using human whole blood. Dilutions were made to simulate 1,000, 10,000, 100,000, and 500,000 parasites/μl. Leukocyte depletion was performed on a subset of samples using 2.5 mL of blood in a CF11 column. Insufficient parasites were acquired to fulfill the necessary 2.5 mL of blood for leukocyte depletion of the 100,000 and 500,000 parasites/μl mixture. DNA extraction of these samples was performed with the Qiagen mini-kit. Dd2 and 3D7 purified DNA was also received from the MR4 repository. Purified NF54 parasite DNA was generously donated by Sanaria Inc. in Rockville MD. Whole blood samples from field isolates came from studies conducted in Southeast Asia [Takala-Harrison et al. under review]. A subset of samples was whole-genome amplified prior to experimentation, for these samples we used a Qiagen REPLI-g mini kit. Simple linear regression was used to test the relationship of DNA quantity with SNP call rate and SNP call accuracy. Informed consent was obtained from participants or their parents or guardians for samples collected as part of clinical trials following protocols approved by the Research Ethics Review Committee of the World Health Organization, local Institutional Review Boards (IRBs), and samples were analyzed following a protocol approved by the University of Maryland, Baltimore IRB.
The P. falciparum gene encoding the 18 s ribosomal subunit was amplified using qPCR for each sample. In a total reaction volume of 25 μl, 2 μl of sample DNA was used along with 10 μM probe, 10 μM of each primer, water, and TaqMan Universal PCR Master Mix (containing AmpliTaq Gold DNA Polymerase, dNTPs, and dUTP). The sequences for primers and probe follows: Forward- 5′-GTA ATT GGA ATG ATA GGA ATT TAC AAG GT-3′, Reverse- 5′-TCA ACT ACG AAC GTT TTA ACT GCA AC-3′, Probe- 5′-FAM GAA CGG GAG GTT AAC AA MGB-3′. PCR conditions were 15 minutes at 95°C, 15 seconds at 95°C, and 45 cycles for 1 minute at 60°C. For quantification a standard curve was generated and run on each plate as well as a no DNA control. Standard curve DNA was derived from purified NF54 parasite DNA and quantified using CYBR Green. This DNA was diluted to 3, 1.5, 0.75, 0.375, 0.188, 0.094, and 0.047 ng per μl and each standard and sample was run in duplicate with the final quantity expressed as the mean of both values.
Availability of supporting data
The data set supporting the results of this article is available in the NCBI Gene Expression Omnibus data repository, Accession: GSE56305, http://www.ncbi.nlm.nih.gov/geo/.
Single nucleotide polymorphism
Quantitative polymerase chain reaction
This work was supported by the Howard Hughes Medical Institute, the Doris Duke Charitable Foundation and National Institutes of Health Grant R01AI10171302 (to S.T.-H).
- Thera MA, Plowe CV: Vaccines for malaria: how close are we?. Annu Rev Med. 2012, 63: 345-357.PubMed CentralPubMedView ArticleGoogle Scholar
- Dondorp AM, Fairhurst RM, Slutsker L, Macarthur JR, Breman JG, Guerin PJ, Wellems TE, Ringwald P, Newman RD, Plowe CV: The threat of artemisinin-resistant malaria. N Engl J Med. 2011, 365: 1073-1075.PubMed CentralPubMedView ArticleGoogle Scholar
- Takala-Harrison S, Clark TG, Jacob CG, Cummings MP, Miotto O, Dondorp AM, Fukuda MM, Nosten F, Noedl H, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, Socheat D, Ariey F, Phyo AP, Starzengruber P, Fuehrer HP, Swoboda P, Stepniewska K, Flegg J, Arze C, Cerqueira GC, Silva JC, Ricklefs SM, Porcella SF, Stephens RM, Adams M, et al: Genetic loci associated with delayed clearance of Plasmodium falciparum following artemisinin treatment in Southeast Asia. Proc Natl Acad Sci U S A. 2013, 110: 240-245.PubMed CentralPubMedView ArticleGoogle Scholar
- Miotto O, Almagro-Garcia J, Manske M, MacInnis B, Campino S, Rockett KA, Amaratunga C, Lim P, Suon S, Sreng S, Anderson JM, Duong S, Nguon C, Chuor CM, Saunders D, Se Y, Lon C, Fukuda MM, Amenga-Etego L, Hodgson AV, Asoala V, Imwong M, Takala-Harrison S, Nosten F, Su XZ, Ringwald P, Ariey F, Dolecek C, Hien TT, Boni MF, et al: Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet. 2013, 45: 648-655.PubMedView ArticleGoogle Scholar
- Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois AC, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Menard S, Rogers WO, Genton B, Fandeur T, Miotto O, Ringwald P, Le BJ, Berry A, Barale JC, Fairhurst RM, Benoit-Vical F, Mercereau-Puijalon O, Menard D: A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature. 2014, 505: 50-55.PubMedView ArticleGoogle Scholar
- Cheeseman IH, Miller BA, Nair S, Nkhoma S, Tan A, Tan JC, Al SS, Phyo AP, Moo CL, Lwin KM, McGready R, Ashley E, Imwong M, Stepniewska K, Yi P, Dondorp AM, Mayxay M, Newton PN, White NJ, Nosten F, Ferdig MT, Anderson TJ: A major genome region underlying artemisinin resistance in malaria. Science. 2012, 336: 79-82.PubMed CentralPubMedView ArticleGoogle Scholar
- Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Daily JP, Sarr O, Ndiaye D, Ndir O, Mboup S, Duraisingh MT, Lukens A, Derr A, Stange-Thomann N, Waggoner S, Onofrio R, Ziaugra L, Mauceli E, Gnerre S, Jaffe DB, Zainoun J, Wiegand RC, Birren BW, Hartl DL, Galagan JE, Lander ES, Wirth DF: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007, 39: 113-119.PubMedView ArticleGoogle Scholar
- Manske M, Miotto O, Campino S, Auburn S, Magro-Garcia J, Maslen G, O’Brien J, Djimde A, Doumbo O, Zongo I, Ouedraogo JB, Michon P, Mueller I, Siba P, Nzila A, Borrmann S, Kiara SM, Marsh K, Jiang H, Su XZ, Amaratunga C, Fairhurst R, Socheat D, Nosten F, Imwong M, White NJ, Sanders M, Anastasi E, Alcock D, Drury E, et al: Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012, 487: 375-379.PubMed CentralPubMedView ArticleGoogle Scholar
- Tan JC, Miller BA, Tan A, Patel JJ, Cheeseman IH, Anderson TJ, Manske M, Maslen G, Kwiatkowski DP, Ferdig MT: An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations. Genome Biol. 2011, 12: R35-PubMed CentralPubMedView ArticleGoogle Scholar
- Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, Djimde A, Doumbo OK, Kiara SM, Nzila A, Borrmann S, Marsh K, Michon P, Mueller I, Siba P, Jiang H, Su XZ, Amaratunga C, Socheat D, Fairhurst RM, Imwong M, Anderson T, Nosten F, White NJ, Gwilliam R, Deloukas P, MacInnis B, Newbold CI, Rockett K, Clark TG, et al: Population genetic analysis of Plasmodium falciparum parasites using a customized Illumina GoldenGate genotyping assay. PLoS One. 2011, 6: e20251-PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang H, Yi M, Mu J, Zhang L, Ivens A, Klimczak LJ, Huyen Y, Stephens RM, Su XZ: Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray. BMC Genomics. 2008, 9: 398-PubMed CentralPubMedView ArticleGoogle Scholar
- Kidgell C, Volkman SK, Daily J, Borevitz JO, Plouffe D, Zhou Y, Johnson JR, Le RK, Sarr O, Ndir O, Mboup S, Batalov S, Wirth DF, Winzeler EA: A systematic map of genetic variation in Plasmodium falciparum. PLoS Pathog. 2006, 2: e57-PubMed CentralPubMedView ArticleGoogle Scholar
- Teo YY, Inouye M, Small KS, Gwilliam R, Deloukas P, Kwiatkowski DP, Clark TG: A genotype calling algorithm for the Illumina BeadArray platform. Bioinformatics. 2007, 23: 2741-2746.PubMed CentralPubMedView ArticleGoogle Scholar
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007, 8: 485-499.PubMedView ArticleGoogle Scholar
- Dharia NV, Sidhu AB, Cassera MB, Westenberger SJ, Bopp SE, Eastman RT, Plouffe D, Batalov S, Park DJ, Volkman SK, Wirth DF, Zhou Y, Fidock DA, Winzeler EA: Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum. Genome Biol. 2009, 10: R21-PubMed CentralPubMedView ArticleGoogle Scholar
- Takala SL, Plowe CV: Genetic diversity and malaria vaccine design, testing and efficacy: preventing and overcoming ‘vaccine resistant malaria’. Parasite Immunol. 2009, 31: 560-573.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.