Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray
© Jiang et al; licensee BioMed Central Ltd. 2008
Received: 27 May 2008
Accepted: 25 August 2008
Published: 25 August 2008
Genetic mapping is a powerful method to identify mutations that cause drug resistance and other phenotypic changes in the human malaria parasite Plasmodium falciparum. For efficient mapping of a target gene, it is often necessary to genotype a large number of polymorphic markers. Currently, a community effort is underway to collect single nucleotide polymorphisms (SNP) from the parasite genome. Here we evaluate polymorphism detection accuracy of a high-density 'tiling' microarray with 2.56 million probes by comparing single feature polymorphisms (SFP) calls from the microarray with known SNP among parasite isolates.
We found that probe GC content, SNP position in a probe, probe coverage, and signal ratio cutoff values were important factors for accurate detection of SFP in the parasite genome. We established a set of SFP calling parameters that could predict mSFP (SFP called by multiple overlapping probes) with high accuracy (≥ 94%) and identified 121,087 mSFP genome-wide from five parasite isolates including 40,354 unique mSFP (excluding those from multi-gene families) and ~18,000 new mSFP, producing a genetic map with an average of one unique mSFP per 570 bp. Genomic copy number variation (CNV) among the parasites was also cataloged and compared.
A large number of mSFP were discovered from the P. falciparum genome using a high-density microarray, most of which were in clusters of highly polymorphic genes at chromosome ends. Our method for accurate mSFP detection and the mSFP identified will greatly facilitate large-scale studies of genome variation in the P. falciparum parasite and provide useful resources for mapping important parasite traits.
Malaria parasites, particularly Plasmodium falciparum, impose heavy economic and health burdens on human population worldwide . Hundreds of millions of people are infected by the parasite each year, leading to 1–2 million deaths annually. Lack of effective vaccines and emergence of drug-resistant parasites and insecticide-resistant mosquito vectors are the main reasons for the failure in controlling the parasites and the associated disease. A better understanding of the molecular mechanisms of drug resistance, the molecular basis of the host immune response, and the strategies the parasite employs to evade host immunity is critical for vaccine and drug development.
Genetic variation in parasites can contribute to drug resistance, immune evasion, and disease manifestation. Genetic mapping is one of the powerful approaches for the identification of mutations that cause drug resistance and changes in other phenotypes . For efficient mapping of a target gene, it is often necessary to genotype a large number of polymorphic markers. In addition to length polymorphisms such as microsatellites and minisatellites and large-scale sequencing, genome-wide single nucleotide polymorphisms (SNP) have been identified from many organisms, including P. falciparum, for genotyping and mapping genes associated with different phenotypes [3–5]. High-throughput SNP typing methods have also been developed [6–11], leading to recent successful identification of candidate genes (loci) associated with various human diseases [12–20].
One of the high-throughput typing methods is array-based hybridization. In this method, labeled genomic DNA is hybridized to microarrays comprising high-density short oligonucleotides designed based on known SNP or systematically tiled along all chromosomes to detect potential polymorphisms. High-density arrays have been successfully used to detect variation in copy number [21–23] and SNP [24, 25]. The human malaria parasite P. falciparum has a genome with extremely high AT content (> 80%) as well as numerous repetitive sequences , making array design and data analysis challenging. Hybridizations of P. falciparum genomic DNA to both Affymetrix GeneChips® and slides printed with 70 mer oligonucleotides have been reported previously [27–29]. Kidgell et al. recently used an array with 327,782 probes to identify 23,653 single feature polymorphisms (SFP) among 14 isolates. The results from this study suggest that high-density array could be a promising tool for high-throughput detection of genome variations including SNP and copy number variations (CNV). However, calling SNP based on hybridization signals is a complex process, and many factors can affect SNP calling, including array design, GC content of a probe, the position of the SNP in a probe, hybridization conditions, and algorithms used to analyze array signals. Additionally, methods were developed to call SFP in many previous studies, but the accuracy of SFP calls were not verified with known SNP or through DNA sequencing. To investigate the influences of these factors on calling SFP in a highly AT-rich genome and to develop a reliable method for calling SFP from the P. falciparum genome using commercially available array platforms, we have analyzed data from a high-density 'tiling' array with ~2.5 million 25 mer probes designed at The Sanger Institute (PFSANGER GeneChips®) to detect genomic variations in five P. falciparum field isolates. Genomic DNA samples from the five parasite isolates were hybridized to the array, and signals from the parasites were compared with known SNP  to evaluate SNP calling accuracy under different conditions. Based on the comparison, we identified factors that could affect probe/DNA hybridization dynamics and established a set of conditions that allowed us to call SFP/SNP with ≥ 94% accuracy. We also sequenced 52 SFP calls that did not agree with known SNP and found that ~64% of the 'wrong' calls were actually due to errors in the genome sequences. Parameters that provided best SNP calling accuracy were used to identify 121,087 potential SNP, including ~18,000 new SFP that have not been reported previously.
Basic probe statistics and quality control
The array has 2.56 million perfect-matched probes (25 mer) with 2,206,371 P. falciparum-specific probes (the rest of the probes were for rodent malaria parasites). Of the P. falciparum probes, 2,107,319 mapped uniquely to the genome and 99,052 mapped to more than one location or were not assigned to any chromosomes. Among the unique probes, 1,446,824 were in the predicted coding regions (CDS); 1,304,180 probes were within exons; 727,200 probes were intergenic; 84,622 were within introns; 58,022 probes spanned exon/intron junctions, and 32,347 probes spanned the predicted translation start sites or stop codons.
Genomic DNA from five different parasites (Additional file 1) were labeled and hybridized (2–4 replicates) to the PFSANGER GeneChip®. After normalization of the hybridization signals across all array chips, an average signal intensity for each probe was calculated from replicates of each parasite. The qualities of the hybridizations were evaluated using various methods including MA plots, scatter plots (data not shown), and coefficient of variance (CV) tests (Additional file 1). Good reproducibility was obtained among replicates with the majority of the probes (> 90%) having CV less than 25% (Additional file 1). Histograms of signal ratios relative to 3D7, the reference genome, showed similar data distribution among different parasite samples (Additional file 2).
Probe coverage of known SNP
Accurate SNP calling and detection of insertions/deletions requires optimization of calling parameters. Here we evaluated potential factors that might affect SFP calling accuracy by comparing known SNP between 3D7 and four other parasites (Dd2, HB3, 7G8, and FCR3) identified in our previous study (i.e., NIAID SNP)  and hybridization signal ratios. Among the 3,836 NIAID SNP (excluding 82 that were mapped to multiple sites) identified previously, 2,651 (69%) were covered by 10,841 probes, including 1,787 covered by 5,600 probes in the predicted exons. The majority of the SNP were covered by 1–5 probes (average 4.4 probes/SNP), with a maximum coverage of 45 probes/SNP (Additional file 3). Overall, the SNP were distributed evenly across the 25 mer positions in the probe, with ~94% of probes having one SNP (Additional file 4).
Probe GC content and hybridization intensity
Substitution positions in a probe and hybridization dynamics
Estimates of correct SFP call rates
Comparison of correct mSFP calling rates using different cut off values
Sequencing verification of SFP calls
DNA sequencing verification of false negative (Fn) and false positive (Fp) calls
Use of receiver operating characteristic (ROC) curves to estimate call rates
SFP were called using Z-scores of 1.5, 2.0, 3.0 and 4.0 and compared with SFP called using signal ratio cutoffs of 1.5, 2.0, 3.0, and 5.0. Results from cutoffs of Z-score of 3.0 and signal ratio of 3.0 had the best overall matches (~99%) and the best positive SFP call matches (~82%) for all 14 chromosomes. To minimize Fp calls (low Fp rate is important for genetic mapping) from unknown parasites that might have higher background, however, we decided to use a conservative signal ratio cutoff value of 5.0. Using this cutoff value, almost all (~98%) of the positive calls matched a positive call from a Z-score cutoff 3.0.
Detection of genome-wide substitutions among field isolates
Summary of mSFP calls for the 14 chromosomes among five parasite isolates
Some chromosomes appeared to have unusually large numbers of mSFP calls from some parasites. For example, Dd2 had 1636 unique mSFP from chromosome 2, whereas the other four parasites had fewer than 400 mSFP (Table 3). Close inspection of the calls revealed that the majority of the extra mSFP was from a deletion at one end of chromosome 2 in Dd2 (Additional files 8 and 9). Similarly, the higher numbers of mSFP from chromosome 12, 13, and 14 of HB3 were from specific regions either deleted or having highly polymorphic genes in a specific parasite (Additional file 8 and 9).
Genome-wide mSFP distribution
We noticed that many of the PlasmoDB SNP (51.1%) were located on chromosomal regions that did not have probe coverage (Figure 4). Because the majority of the regions without probe coverage were likely in areas of AT-rich repetitive and/or noncoding sequences, the observation suggested that relatively larger numbers of SNP in the PlasmoDB could be from repetitive sequences.
We next counted mSFP in a window of 10-kb segments and plotted mSFP from each segment along the chromosomes to investigate mSFP distribution on the chromosomes from each parasite (Additional file 8). Again, these plots showed clusters of some highly polymorphic regions, mostly at chromosome ends, corresponding to var/rif/stevor clusters. The plots also identified some unique peaks for individual parasite, for example, a unique peak on chromosome 2 for Dd2 and HB3, respectively. These unique peaks were likely due to deleted DNA segments or reflected the unique selection and evolutionary histories in an individual parasite (Additional file 8).
The majority of the regions with reduced signals (blue) were located on chromosomes ends or regions containing the var/rif/stevor gene clusters, reflecting the highly variable nature of these DNA regions (Figure 5). Although it is difficult to distinguish highly polymorphic regions from deletions in this haploid genome, we considered several additional restrictions to exclude potential polymorphic loci. A segment was considered not truly deleted if it contained known highly polymorphic genes such as var/rif/stevor  or if a segment had reduced signals in all four parasites (suggesting highly polymorphic genes such as genes encoding surface proteins). For segments with reduced signal ratios occurring only in one or two parasites, they were more likely to be true deletions, which could also be detected in mSFP distribution plots (Additional file 8). For example, a deletion of ~42-kb segment (PFB0070w-PFB0100c) on chromosome 2 of Dd2 and FCR3 was found to contain a gene encoding knob-associated histidine-rich protein (KAHRP). Deletion of KAHRP in Dd2 was reported previously [28, 29, 33]. Another likely deleted segment was a ~98-kb region on chromosome 9 of HB3 containing 19 genes (PFI1710w-PFI1800w) including the gene encoding cytoadherence linked asexual protein (CLAG) and lysophospholipase. Again, deletion of this region had been reported . A list of chromosome segments and mapped genes potentially amplified or deleted/highly polymorphic, including those reported previously, can be found in Additional file 9.
The PFSANGER array, despite having ~2.2 million P. falciparum probes, was not designed specifically for SNP detection, and whether it was suitable for SNP detection was not certain. This study was initiated to investigate the possibility of using the PFSANGER array for genetic mapping and population studies. The large number of probes on the chip and their high AT content (some > 80%) require critical evaluation of factors that may affect hybridization dynamics before SFP can be reliably called. Based on comparison of mSFP calls with known SNP identified previously , we showed that the last two end positions in a probe had limited influence on hybridization signal and that probes with GC contents lower than 16% should be excluded for SFP calling in this genome. We also found that mSFP calls based on a single probe were not reliable after resequencing. For a potential mSFP call, a conservative signal cutoff ratio of 3–5.0 and a vote among several adjacent probes (within 25 bp) with a majority of the probes (at least 75%) should be applied. We demonstrated that this particular microarray could be successfully employed to detect mSFP with high mSFP calling accuracy (≥ 94%). This work provides important information for calling mSFP in the P. falicparum genome using microarrays.
We used a 5.0 cutoff ratio in calling SFP because for genetic mapping, a high Fp rate may lead to misleading results that should be minimized. A higher cutoff value may result in a higher Fn rate or missing some calls too. Missing some calls will not be a big issue as the array can detect a large number of SFP. The 5.0 cutoff therefore represents a conservative value for minimizing Fp calls, considering potential higher backgrounds that may exist in some field isolates such as FCR3 in this study. Higher background in FCR3 requires further investigations, although signal intensity and distribution from this parasite appeared to be similar to those from other parasites (Additional file 1 and 2). A sample mixed with a smaller percentage of DNA from a different genotype (strain) may increase the hybridization background signal. Indeed, typing DNA from the FCR3 parasite with microsatellites showed that the DNA sample appeared to contain a secondary peak in some markers (data not shown). If this is true, a sample with high background may have to be discarded.
Using an array with a much higher density of probes than those published previously [27–29], we identified 121,087 mSFP from five isolates, including ~18,000 new mSFP after excluding mSFP from multigene families. Among the 121,087 mSFP, ~67% were in clusters of highly polymorphic genes such as var/rif/stevor. Approximately 89% of our mSFP calls that also had probes spanning known SNP in PlasmoDB matched the SNP, reflecting relative high accuracy of our mSFP calls, although our stringent cutoff values may lead to higher Fn rates or "no-calls" (such as excluding single probe calls). Our mSFP also provided additional evidence confirming the SNP reported previously, which is important because the majority of SNP in PlasmoDB were generated from shotgun sequences and sequence alignments have not been visually inspected or adjusted. For a genome with large number of repetitive sequences, sequence alignment errors can be generated if sequence alignment is totally relied on computer software .
Distributions of mSFP across the chromosomes among the parasites were very similar except for a few unique peaks that may reflect deletion or amplification in each individual parasite. If we exclude the mSFP from the multigene families, we obtained 40,354 mSFP or approximately 570 bp per SFP in the genome, a frequency that is within the range (519–976 bp per SNP) of our previous estimates  and similar to an estimate of 446 bp per SNP by another group . If we consider 45% of the 40,354 mSFP from five isolates as common mSFP, as estimated previously , we can expect ~18,000 common mSFP in the five parasite genomes that will be useful for genetic mapping.
The highly AT-rich P. falciparum genome has a large number of repetitive sequences and low complexity regions in protein coding sequences [35–37]. The non-coding regions consist of more than 40% of the genome and generally have AT content >90% with large numbers of polymorphic AT repeats and polyA/T tracts [26, 38]. These high-AT regions not only present a problem for genome sequencing and DNA sequence alignment but also make it difficult to design sequence-specific probes with reliable hybridization dynamics. SNP in these regions may not be very useful for mapping purposes because of difficulty in designing oligonucleotide probes or PCR primers for genotyping. Indeed, analyses of signal intensity from probes with different GC contents showed that probes with GC contents <16% produced similar low signals, suggesting that these probes might not be practical for calling mSFP. Of interest, probes with GC content >50% also produced highly variable signals. The majority of high-GC probes from the variable var genes can partly contribute to this variation. We excluded probes with GC content >50% for several reasons: 1) Approximately 44% of the probes with GC content >50% were var probes that should be discarded; 2) probes with high GC content would have higher 'affinity' than those with lower GC content during hybridization. A substitution in a probe with high GC content may not reduce the hybridization signal as much as a probe with low GC content; 3) there were only ~3000 probes with GC contents >50%. Exclusion of these probes should not have significant impact on our SFP calls.
The P. falciparum chromosomes have been shown to be highly variable in size in pulse-field gel electrophoresis (PFG) . Genomic segmentation analysis to detect chromosome deletion and amplification showed relatively few amplification/deletion events with segment size > 0.3 kb. The variation in chromosome sizes seen in PFG gels could be mainly due to chromosome translocation, which is difficult if not impossible to detect using microarrays. One of the amplified regions was a segment on chromosome 5 containing the pfmdr1 gene in the Dd2 and FCR3 parasites. Amplification of the pfmdr1 locus has been reported [28, 29, 33], which could be due to drug selection pressure . Similarly, there were few deletions larger than 10 kb; many of the deleted/amplified regions detected in our study matched well with those reported previously [28, 29]. Two well-known deleted regions on chromosome 2 and 9, respectively, were detected in our analyses [34, 41]. Detection of previously reported deletions suggested that our methods for detecting deletion/amplification were working properly. However, using an array with higher probe density than previous studies, we also discovered many deletions/amplifications that have not been described previously (Additional file 9). We identified 181 amplified and 536 highly variable or deleted genes or fragments, 74 (40.9%) and 30 (5.6%) of which, respectively, were reported previously [28, 29, 33]. Some of the discrepancies were likely due to different filtering criteria used (e.g. cutoff ratios, minimum number of probes, length cutoff of segment). Because of our small parasite sample size, it is difficult to make any functional inferences from the amplifications and deletions found in this study, although amplification at the pfmdr1 locus may be associated with responses to some anti-malarial drugs [40, 42], and amplification of chromosome 4 in FCR3 may contribute to its adaptation to higher growth rates.
This study developed methods for accurate detection of mSFP and CNV in the P. falciparum genome after evaluating factors that can influence DNA hybridization dynamics. More than 120,000 mSFP, including ~18,000 new and unique mSFP, and various chromosomal amplification/deletions were identified from the P. falciparum genome. Nearly 70% of the polymorphic sites are in clusters of var/rif/stevor gene families. Use of this array to analyze DNA samples from large numbers of parasites will facilitate our understanding of parasite diversity and evolution and genetic mapping of important parasite traits.
Parasites and parasite culture
P. falciparum parasite isolates used in this study have been described [4, 43]. The parasites were cultured in vitro according to the methods of Trager and Jensen . Briefly, parasites were maintained in RPMI 1640 medium containing 5% human O+ erythrocytes (5% hematocrit), 0.5% Albumax (GIBCO, Life Technologies, Grand Island, NY), 24 mM sodium bicarbonate, and 10 μg/ml gentamycin at 37°C with 5% CO2, 5% O2, and 90% N2.
DNA extraction and probe labeling
Parasites were cultured to a parasitemia of 5% or higher; and the cultures were centrifuged at 5000g to collect red blood cells that were lyzed with addition of 10 vol of 0.1% saponin in PBS. The parasites were centrifuged again; and genomic DNA was extracted from the parasite pellet using Wizard Genomic DNA Purification kit (Promega, Madison, WI). Genomic DNA (10 μg) from each parasite was used as probes in the hybridizations. Briefly, genomic DNA was fragmented to an average size of 50–150 bp with DNase I and the quality of the digested DNA evaluated in 2% agarose gels. Subsequently, fragmented DNA was end-labeled using terminal deoxynucleotidyl transferase and a biotin labeling kit (Affymetrix mapping 250 K reagent kit; Affymetrix, Inc., Santa Clara, CA).
The PFSANGER Genechip® was purchased from Affymetrix, Inc. Array hybridization was performed at the microarray facility of the Laboratory of Immunopathogenesis and Bioinformatics, SAIC-Frederick, Inc (Frederick, MD). Briefly, biotin-labeled DNA were hybridized to array chips at 45°C for 16 h with constant rotation at 60 rpm. Affymetrix 20× hybridization control was used to make the hybridization cocktail. Hybridized chips were washed and stained following the company's EukGE-WS2v5 protocol. The chips were then scanned at 570 nm emission wavelength using an Affymetrix scanner 3000. All the parasites have two or more biological replicates (Additional file 1).
Microarray chip design and data analysis
The probes were designed based on P. falciparum genome (3D7) sequence v2.1.1  covering genomic regions where unique probes with a reasonably broad 'thermal' range could be designed. A brief description of the array design has been reported recently . Because of recent updates of genome databases, all probe sequences were reassigned with new coordinates along each chromosome and their relative positions in a predicted gene (exon, intron, across exon and intron, and intergenic regions) according to the 3D7 genome sequence in PlasmoDB V5.2. The scanned image CEL files were processed and analyzed using the R/Bioconductor package and the robust multichip analysis method . Basically, the programs retrieved probe information (perfect match only), performed background subtraction, quantile-normalized signals from the chips, and transformed the data into a final normalized data matrix of log2 values. Partek Genomics Suite 6.3 (Partek Inc., St. Louis, MO) and in-house programs are also used in SFP calling and copy number analyses.
Mapping known SNP to array probes
After determining the correct genomic coordinates for each SNP and each array probe, known SNP from our previous study  and those in PlasmoDB [3, 5, 28, 45] were mapped to probes that covered known SNP positions. Ambiguous SNP (mapped to multiple positions) were removed, and the remaining SNP were uploaded to a genome browser  with allele information from different parasites.
Because the signals from the probes do not allow for accurate mapping of the position of a SNP within a probe at the given probe density, we can only assert that somewhere within a probe there is likely a polymorphism. Therefore, we simply assigned the polymorphism to a feature (probe) and called it a single feature polymorphism (SFP) as described . Because a polymorphic site was often covered by multiple probes (average ~4 probes), we treated calls from probes within 25 bp as one SFP (called mSFP). To establish optimal parameters for SFP calling, we investigated SFP calling rates and calling accuracies using various conditions. We first identified all of the probes that covered each SNP identified in our previous study . Then we extracted their hybridization signals from a normalized data file. The average probe intensity (average of antilogs of the raw data) from the normalized data for all replicates of each parasite isolate was calculated. This value was compared with the average signal for 3D7 obtained in the same way. A ratio was obtained after comparison with the signal from that of 3D7. We evaluated the influences of SNP position in a probe, GC content of a probe, cutoff ratios of hybridization signal, and numbers of probes on SFP calling accuracy. Probes with GC content < 16% and > 50%, and probes with multiple hits in the genome were excluded for the analyses. The last two nucleotides at each ends of a probe were also discarded, because substitutions at these positions had minimal influences on hybridization signals.
Once optimal parameters were identified for calling SNP using the NIAID SNP as an input set to test the method, we applied a similar procedure to a whole genome scan for probe-based SFP and mSFP (Additional file 9). Probe ratios were computed for each parasite for each probe, and raw alleles were generated by applying the cutoff ratio of 5.0 – it was an SFP if a ratio was above the cutoff value and it was not if below the ratio. Next, going through one parasite at a time, all probes were considered where there was more than one positive probe in a row within 25 bp of one another. Once this filtered set of probes was extracted from the full set, the ratios of intensity for each of the isolates compared with 3D7 was computed and tabulated. From this table, a vector was constructed for each parasite isolate where either a '1' or a '0' was added to each position determined by the value of the ratio. This vector was then scanned for stretches of '1's where the distance between the probes was less than 25 bp. In cases where longer stretches were identified, they were output as an additional feature type called long multiprobe polymorphism. Because some probes represent different strands of the exact same sequence region, we also discarded those stretches of '1's where the probes on either strand had a distance of 0 bp from the neighboring probe but did not exceed the threshold ratio value. All of the multiprobe polymorphisms corresponding to the mSFP were then output, and both classes of polymorphisms (single probe SFP and multi-probe mSFP) were then loaded into the genome browser. The procedure also tracks the 'alleles' by parasite isolate to determine the counts of mSFP shared by each possible combination of parasite isolates. Additional parameters that added confidence to a particular mSFP call, such as multiple parasite isolates having the same SFP and matches to known SNP in PlasmoDB, were also indicated.
Estimating SFP calling rates using ROC curve and Z-score
Hybridization measurements from Affymetrix CEL files were pre-processed in the R programming environment  using the read.affybatch function from the affy BioConductor package . Background adjustment was performed using the method developed for the RMA algorithm, and normalization was done using the quantile method. Differential hybridization between parasite isolates was expressed as Z-scores calculated by the LPE package [30, 50].
To verify selected mSNP (Table 2) that might be called incorrectly or calls that had contradictory signals, we amplified DNA fragments of 200–500 bp containing the probes and sequenced the PCR products directly according to methods described . Primer sequences used in PCR and DNA sequencing are listed in Table 2.
Detection of CNV
To detect CNV, we imported the filtered probe data into Partek Genomics Suite v6.3 and normalized individual probe signal from the 3D7 reference genome to 1.0 (haploid genome). Basically, the genomic segmentation algorithm finds a segment according to three criteria: 1) neighboring regions have statistically significantly different average intensities (P ≥ 0.00001); 2) breakpoints (region boundaries) were chosen to give optimal statistical significance (smallest P-value); and 3) detected regions must contain a minimum of 15 probes. After determining the segments that had average signals higher or lower than 1.5 fold of those of the 3D7 reference, we filtered out regions that were less than 300 bp long. Detected segments, representing potential deletions or highly polymorphic regions, were plotted along chromosomes to produce CN genome view (Figure 5); and the segments were mapped to predicted genes in PlasmoDB to generate additional file 9. To screen for those highly polymorphic genes from potentially deleted segments, we flagged segments containing var/rif/stevor and other multigene families.
copy number variation
coefficient of variance
local pooled error
receiver operating characteristic
single feature polymorphism
SFP called by two or more overlapping probes
single nucleotide polymorphism.
This work was supported by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health; the Intramural Research Program of the Center for Cancer Research, National Cancer Institute, National Institutes of Health; and in part was funded by NCI contract N01-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U. S. Government. AI was supported by the Wellcome Trust. We thank NIAID intramural editor Brenda Rae Marshall for assistance and Jun Yang and Brandie Fullmer at the Laboratory of Immunopathogenesis and Bioinformatics, SAIC-Frederick, Inc. for microarray hybridizations, and David Bennett at Partek Inc. for advanced data analysis help.
- Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI: The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature. 2005, 434: 214-217. 10.1038/nature03342.PubMedPubMed CentralView ArticleGoogle Scholar
- Su X-z, Hayton K, Wellems TE: Genetic linkage and association analyses for trait mapping in Plasmodium falciparum. Nat Rev Genet. 2007, 8: 497-506. 10.1038/nrg2126.PubMedView ArticleGoogle Scholar
- Jeffares DC, Pain A, Berry A, Cox AV, Stalker J, Ingle CE, Thomas A, Quail MA, Siebenthall K, Uhlemann AC, Kyes S, Krishna S, Newbold C, Dermitzakis ET, Berriman M: Genome variation and evolution of the malaria parasite Plasmodium falciparum. Nat Genet. 2007, 39: 120-125. 10.1038/ng1931.PubMedPubMed CentralView ArticleGoogle Scholar
- Mu J, Awadalla P, Duan J, McGee KM, Keebler J, Seydel K, McVean GA, Su X-z: Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome. Nat Genet. 2007, 39: 126-130. 10.1038/ng1924.PubMedView ArticleGoogle Scholar
- Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Daily JP, Sarr O, Ndiaye D, Ndir O, Mboup S, Duraisingh MT, Lukens A, Derr A, Stange-Thomann N, Waggoner S, Onofrio R, Ziaugra L, Mauceli E, Gnerre S, Jaffe DB, Zainoun J, Wiegand RC, Birren BW, Hartl DL, Galagan JE, Lander ES, Wirth DF: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007, 39: 113-119. 10.1038/ng1930.PubMedView ArticleGoogle Scholar
- Lindblad-Toh K, Winchester E, Daly MJ, Wang DG, Hirschhorn JN, Laviolette JP, Ardlie K, Reich DE, Robinson E, Sklar P, Shah N, Thomas D, Fan JB, Gingeras T, Warrington J, Patil N, Hudson TJ, Lander ES: Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nat Genet. 2000, 24: 381-386. 10.1038/74215.PubMedView ArticleGoogle Scholar
- Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, Liu W, Yang G, Di X, Ryder T, He Z, Surti U, Phillips MS, Boyce-Jacino MT, Fodor SP, Jones KW: Large-scale genotyping of complex DNA. Nat Biotechnol. 2003, 21: 1233-1237. 10.1038/nbt869.PubMedView ArticleGoogle Scholar
- Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A genome-wide scalable SNP genotyping assay using microarray technology. Nat Genet. 2005, 37: 549-554. 10.1038/ng1547.PubMedView ArticleGoogle Scholar
- Hardenbol P, Yu F, Belmont J, Mackenzie J, Bruckner C, Brundage T, Boudreau A, Chow S, Eberle J, Erbilgin A, Falkowski M, Fitzgerald R, Ghose S, Iartchouk O, Jain M, Karlin-Neumann G, Lu X, Miao X, Moore B, Moorhead M, Namsaraev E, Pasternak S, Prakash E, Tran K, Wang Z, Jones HB, Davis RW, Willis TD, Gibbs RA: Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res. 2005, 15: 269-275. 10.1101/gr.3185605.PubMedPubMed CentralView ArticleGoogle Scholar
- Steemers FJ, Gunderson KL: Whole genome genotyping technologies on the BeadArray platform. Biotechnol J. 2007, 2: 41-49. 10.1002/biot.200600213.PubMedView ArticleGoogle Scholar
- Hagiwara H, Sawakami-Kobayashi K, Yamamoto M, Iwasaki S, Sugiura M, Abe H, Kunihiro-Ohashi S, Takase K, Yamane N, Kato K, Son R, Nakamura M, Segawa O, Yoshida M, Yohda M, Tajima H, Kobori M, Takahama Y, Itakura M, Machida M: Development of an automated SNP analysis method using a paramagnetic beads handling robot. Biotechnol Bioeng. 2007Google Scholar
- Hakonarson H, Grant SF, Bradfield JP, Marchand L, Kim CE, Glessner JT, Grabs R, Casalunovo T, Taback SP, Frackelton EC, Lawson ML, Robinson LJ, Skraban R, Lu Y, Chiavacci RM, Stanley CA, Kirsch SE, Rappaport EF, Orange JS, Monos DS, Devoto M, Qu HQ, Polychronakos C: A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature. 2007, 448 (7153): 591-4. 10.1038/nature06010.PubMedView ArticleGoogle Scholar
- Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, Balkau B, Heude B, Charpentier G, Hudson TJ, Montpetit A, Pshezhetsky AV, Prentki M, Posner BI, Balding DJ, Meyre D, Polychronakos C, Froguel P: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007, 445: 881-885. 10.1038/nature05616.PubMedView ArticleGoogle Scholar
- Consortium WTCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661-678. 10.1038/nature05911.View ArticleGoogle Scholar
- Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007, 316: 1341-1345. 10.1126/science.1142382.PubMedPubMed CentralView ArticleGoogle Scholar
- Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ, Hughes TE, Groop L, Altshuler D, Almgren P, Florez JC, Meyer J, Ardlie K, Bengtsson Bostrom K, Isomaa B, Lettre G, Lindblad U, Lyon HN, Melander O, Newton-Cheh C, Nilsson P, Orho-Melander M, Rastam L, Speliotes EK, Taskinen MR, Tuomi T, Guiducci C, Berglund A, Carlson J, Gianniny L, Hackett R, Hall L, Holmkvist J, Laurila E, Sjogren M, Sterner M, Surti A, Svensson M, Svensson M, Tewhey R, Blumenstiel B, Parkin M, Defelice M, Barry R, Brodeur W, Camarata J, Chia N, Fava M, Gibbons J, Handsaker B, Healy C, Nguyen K, Gates C, Sougnez C, Gage D, Nizzari M, Gabriel SB, Chirn GW, Ma Q, Parikh H, Richardson D, Ricke D, Purcell S: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007, 316: 1331-1336. 10.1126/science.1142358.PubMedView ArticleGoogle Scholar
- Winkelmann J, Schormair B, Lichtner P, Ripke S, Xiong L, Jalilzadeh S, Fulda S, Putz B, Eckstein G, Hauk S, Trenkwalder C, Zimprich A, Stiasny-Kolster K, Oertel W, Bachmann CG, Paulus W, Peglau I, Eisensehr I, Montplaisir J, Turecki G, Rouleau G, Gieger C, Illig T, Wichmann HE, Holsboer F, Muller-Myhsok B, Meitinger T: Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nat Genet. 2007, 39: 1000-1006. 10.1038/ng2099.PubMedView ArticleGoogle Scholar
- Buch S, Schafmayer C, Volzke H, Becker C, Franke A, von Eller-Eberstein H, Kluck C, Bassmann I, Brosch M, Lammert F, Miquel JF, Nervi F, Wittig M, Rosskopf D, Timm B, Holl C, Seeger M, Elsharawy A, Lu T, Egberts J, Fandrich F, Folsch UR, Krawczak M, Schreiber S, Nurnberg P, Tepel J, Hampe J: A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease. Nat Genet. 2007, 39: 995-999. 10.1038/ng2101.PubMedView ArticleGoogle Scholar
- Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S, Penegar S, Chandler I, Gorman M, Wood W, Barclay E, Lubbe S, Martin L, Sellick G, Jaeger E, Hubner R, Wild R, Rowan A, Fielding S, Howarth K, Silver A, Atkin W, Muir K, Logan R, Kerr D, Johnstone E, Sieber O, Gray R, Thomas H, Peto J, Cazier JB, Houlston R: A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet. 2007, 39: 984-988. 10.1038/ng2085.PubMedView ArticleGoogle Scholar
- Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, Prendergast J, Olschwang S, Chiang T, Crowdy E, Ferretti V, Laflamme P, Sundararajan S, Roumy S, Olivier JF, Robidoux F, Sladek R, Montpetit A, Campbell P, Bezieau S, O'Shea AM, Zogopoulos G, Cotterchio M, Newcomb P, McLaughlin J, Younghusband B, Green R, Green J, Porteous ME, Campbell H, Blanche H, Sahbatou M, Tubacher E, Bonaiti-Pellie C, Buecher B, Riboli E, Kury S, Chanock SJ, Potter J, Thomas G, Gallinger S, Hudson TJ, Dunlop MG: Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007, 39: 989-994. 10.1038/ng2089.PubMedView ArticleGoogle Scholar
- Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, Futreal PA, Weber B, Shapero MH, Wooster R: High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004, 14: 287-295. 10.1101/gr.2012304.PubMedPubMed CentralView ArticleGoogle Scholar
- Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones KW, Shapero MH: CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics. 2006, 7: 83-10.1186/1471-2105-7-83.PubMedPubMed CentralView ArticleGoogle Scholar
- Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J, Aburatani H: Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 2006, 16: 1575-1584. 10.1101/gr.5629106.PubMedPubMed CentralView ArticleGoogle Scholar
- Hua J, Craig DW, Brun M, Webster J, Zismann V, Tembe W, Joshipura K, Huentelman MJ, Dougherty ER, Stephan DA: SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays. Bioinformatics. 2007, 23: 57-63. 10.1093/bioinformatics/btl536.PubMedView ArticleGoogle Scholar
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007, 8: 485-499. 10.1093/biostatistics/kxl042.PubMedView ArticleGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419: 498-511. 10.1038/nature01097.PubMedView ArticleGoogle Scholar
- Carret CK, Horrocks P, Konfortov B, Winzeler E, Qureshi M, Newbold C, Ivens A: Microarray-based comparative genomic analyses of the human malaria parasite Plasmodium falciparum using Affymetrix arrays. Mol Biochem Parasitol. 2005, 144: 177-186. 10.1016/j.molbiopara.2005.08.010.PubMedView ArticleGoogle Scholar
- Kidgell C, Volkman SK, Daily J, Borevitz JO, Plouffe D, Zhou Y, Johnson JR, Le Roch K, Sarr O, Ndir O, Mboup S, Batalov S, Wirth DF, Winzeler EA: A systematic map of genetic variation in Plasmodium falciparum. PLoS Pathog. 2006, 2: e57-10.1371/journal.ppat.0020057.PubMedPubMed CentralView ArticleGoogle Scholar
- Ribacke U, Mok BW, Wirta V, Normark J, Lundeberg J, Kironde F, Egwang TG, Nilsson P, Wahlgren M: Genome wide gene amplifications and deletions in Plasmodium falciparum. Mol Biochem Parasitol. 2007, 155: 33-44. 10.1016/j.molbiopara.2007.05.005.PubMedView ArticleGoogle Scholar
- Jain N, Thatte J, Braciale T, Ley K, O'Connell M, Lee JK: Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics. 2003, 19: 1945-1951. 10.1093/bioinformatics/btg264.PubMedView ArticleGoogle Scholar
- Siewert B, Bly BM, Schlaug G, Darby DG, Thangaraj V, Warach S, Edelman RR: Comparison of the BOLD- and EPISTAR-technique for functional brain imaging by using signal detection theory. Magn Reson Med. 1996, 36: 249-255. 10.1002/mrm.1910360212.PubMedView ArticleGoogle Scholar
- ABCC GB. [http://p-falcip.abcc.ncifcrf.gov/cgi-bin/gbrowse/malaria_sanger]
- Triglia T, Foote SJ, Kemp DJ, Cowman AF: Amplification of the multidrug resistance gene pfmdr1 in Plasmodium falciparum has arisen as multiple independent events. Mol Cell Biol. 1991, 11: 5244-5250.PubMedPubMed CentralView ArticleGoogle Scholar
- Day KP, Karamalis F, Thompson J, Barnes DA, Peterson C, Brown H, Brown GV, Kemp DJ: Genes necessary for expression of a virulence determinant and for transmission of Plasmodium falciparum are located on a 0.3-megabase region of chromosome 9. Proc Natl Acad Sci USA. 1993, 90: 8292-8296. 10.1073/pnas.90.17.8292.PubMedPubMed CentralView ArticleGoogle Scholar
- Mu J, Duan J, Makova K, Joy DA, Huynh CQ, Branch OH, Li W-h, Su X-z: Chromosome-wide SNPs reveal an ancient origin for Plasmodium falciparum. Nature. 2002, 418: 323-326. 10.1038/nature00836.PubMedView ArticleGoogle Scholar
- Pizzi E, Frontali C: Low-complexity regions in Plasmodium falciparum proteins. Genome Res. 2001, 11: 218-229. 10.1101/gr.GR-1522R.PubMedPubMed CentralView ArticleGoogle Scholar
- Aravind L, Iyer LM, Wellems TE, Miller LH: Plasmodium biology: genomic gleanings. Cell. 2003, 115: 771-785. 10.1016/S0092-8674(03)01023-7.PubMedView ArticleGoogle Scholar
- Su X-z, Wellems TE: Toward a high-resolution Plasmodium falciparum linkage map: polymorphic markers from hundreds of simple sequence repeats. Genomics. 1996, 33: 430-444. 10.1006/geno.1996.0218.PubMedView ArticleGoogle Scholar
- Kemp DJ, Corcoran LM, Coppel RL, Stahl HD, Bianco AE, Brown GV, Anders RF: Size variation in chromosomes from independent cultured isolates of Plasmodium falciparum. Nature. 1985, 315: 347-350. 10.1038/315347a0.PubMedView ArticleGoogle Scholar
- Price RN, Uhlemann AC, Brockman A, McGready R, Ashley E, Phaipun L, Patel R, Laing K, Looareesuwan S, White NJ, Nosten F, Krishna S: Mefloquine resistance in Plasmodium falciparum and increased pfmdr1 gene copy number. Lancet. 2004, 364: 438-447. 10.1016/S0140-6736(04)16767-6.PubMedPubMed CentralView ArticleGoogle Scholar
- Biggs BA, Kemp DJ, Brown GV: Subtelomeric chromosome deletions in field isolates of Plasmodium falciparum and their relationship to loss of cytoadherence in vitro. Proc Natl Acad Sci USA. 1989, 86: 2428-2432. 10.1073/pnas.86.7.2428.PubMedPubMed CentralView ArticleGoogle Scholar
- Cowman AF, Galatis D, Thompson JK: Selection for mefloquine resistance in Plasmodium falciparum is linked to amplification of the pfmdr1 gene and cross-resistance to halofantrine and quinine. Proc Natl Acad Sci USA. 1994, 91: 1143-1147. 10.1073/pnas.91.3.1143.PubMedPubMed CentralView ArticleGoogle Scholar
- Mu J, Ferdig MT, Feng X, Joy DA, Duan J, Furuya T, Subramanian G, Aravind L, Cooper RA, Wootton JC, Xiong M, Su X-z: Multiple transporters associated with malaria parasite responses to chloroquine and quinine. Mol Microbiol. 2003, 49: 977-989. 10.1046/j.1365-2958.2003.03627.x.PubMedView ArticleGoogle Scholar
- Trager W, Jensen JB: Human malaria parasites in continuous culture. Science. 1976, 193: 673-675. 10.1126/science.781840.PubMedView ArticleGoogle Scholar
- Plasmo DB: Plasmodium falciparum genome 3D7 sequence v2.1.1. [http://www.plasmodb.org/plasmo/home.jsp]
- Mourier T, Carret C, Kyes S, Christodoulou Z, Gardner PP, Jeffares DC, Pinches R, Barrell B, Berriman M, Griffiths-Jones S, Ivens A, Newbold C, Pain A: Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum. Genome Res. 2008, 18: 281-292. 10.1101/gr.6836108.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry K, Kustanovich V, Li C, Brown N, Nelson S, Wong W, Lee CJ: Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat Genet. 2000, 26: 233-236. 10.1038/79981.PubMedView ArticleGoogle Scholar
- Team RDC: R: A language and environment for statistical computing. 2006, Vienna: R Foundation for Statistical ComputingGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy – analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20: 307-315. 10.1093/bioinformatics/btg405.PubMedView ArticleGoogle Scholar
- R, package: version 1.8.0. [http://www.r-project.org]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.