High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum
© Samarakoon et al; licensee BioMed Central Ltd. 2011
Received: 23 July 2010
Accepted: 17 February 2011
Published: 17 February 2011
Knowledge of the origins, distribution, and inheritance of variation in the malaria parasite (Plasmodium falciparum) genome is crucial for understanding its evolution; however the 81% (A+T) genome poses challenges to high-throughput sequencing technologies. We explore the viability of the Roche 454 Genome Sequencer FLX (GS FLX) high throughput sequencing technology for both whole genome sequencing and fine-resolution characterization of genetic exchange in malaria parasites.
We present a scheme to survey recombination in the haploid stage genomes of two sibling parasite clones, using whole genome pyrosequencing that includes a sliding window approach to predict recombination breakpoints. Whole genome shotgun (WGS) sequencing generated approximately 2 million reads, with an average read length of approximately 300 bp. De novo assembly using a combination of WGS and 3 kb paired end libraries resulted in contigs ≤ 34 kb. More than 8,000 of the 24,599 SNP markers identified between parents were genotyped in the progeny, resulting in a marker density of approximately 1 marker/3.3 kb and allowing for the detection of previously unrecognized crossovers (COs) and many non crossover (NCO) gene conversions throughout the genome.
By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, we captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. This study is the first to resequence progeny clones to examine fine structure of COs and NCOs in malaria parasites.
Advances in genotyping technology led to an explosion of studies to identify genes of interest using classical genetic approaches . Such studies facilitate the discovery of genetic factors related to disease, drug resistance and environmental response. Different approaches evolved rapidly with improvements in sequencing technology. Additional advances in molecular biology techniques have greatly increased the speed and throughput of discovery and analysis. For example, microarray-based marker discovery has been applied to model organisms such as yeast , Arabidopsis[3, 4], rice [5–7], and non model organisms including the human malaria parasite Plasmodium falciparum[8–11]; however, this platform can be susceptible to poor hybridization efficiency of low complexity regions and difficulties in reproducibility. Such problems are magnified in organisms with high nucleotide bias, particularly the extreme case of P. falciparum - 80.6% (A + T) composition , resulting in limitations in genome-wide coverage and cost effectiveness.
Alternatively, massively parallel DNA sequencing technologies have revolutionized single nucleotide polymorphism (SNP) discovery and the study of genome variation of diverse categories . 454 Life Sciences' pyrosequencing technology was the first next-generation sequencing (NGS) platform to reach the commercial market, offering relatively long reads and solutions to previous bottlenecks such as library preparation, template preparation and sequencing . However, the ambiguous length of homopolymer runs, a primary limitation of this pyrosequencing-based method, may prohibit the sequencing of highly biased genomes.
High-resolution genome views provided by new sequencing technologies can be especially informative when applied to progeny clones derived from genetic crosses. Homologous recombination plays an essential role in ensuring correct chromosomal segregation during meiosis  and increases genetic diversity by reshuffling haplotypes; furthermore it can homogenize alleles through gene conversion . In current models, meiotic recombination is initiated by formation of a double-strand break (DSB). The break is repaired through a series of steps, involving end resection, synthesis and ligation, using the homologous chromosome as a template . Repair results in either a crossover (CO), i.e. reciprocal exchange accompanied by a tract subject to gene conversion, or a non-crossover (NCO), i.e. a tract subject to conversion but not associated with reciprocal exchange .
454 sequencing has been used to discover SNPs in a variety of organisms including plants [17–19], Rhesus Macaque , human  and bacteria . This and other next-generation platforms redefine the quest for high-density marker discovery and genotyping, presenting an opportunity for obtaining a high resolution view of the genome to comprehensively link various types of allelic variants to phenotypes. For example, the 454 technology already has been applied to recombinant inbred lines of rice  and soybean . Longer sequence reads combined with paired end reads will facilitate better mapping to reference genomes.
Here, we use the 454 Genome Sequencer FLX (GS FLX) platform for whole genome shotgun sequencing (WGS) to characterize the genomes of two P. falciparum progeny strains derived from a well-studied genetic cross between a multi-drug resistant and generally drug sensitive parent [24, 25]. In addition to demonstrating the effectiveness of 454 WGS, we demonstrate the first high-resolution allele discovery method to monitor recombination events and their breakpoints along with other forms of genetic variation that distinguish these sibling parasite clones. We examine outcomes of meiosis that can only be recognized at nucleotide-level resolution, including genotype changes accompanying COs and NCOs that can refine our understanding of CO distribution or possible alternative double strand break resolution pathways in P. falciparum.
Results and Discussion
WGS pyrosequencing and de novo assembly
Sequencing parameters of shotgun and 3 kb libraries of Plasmodium falciparum clones SC05 and 7C126 with the 454/Roche GS FLX sequencing platform.
Number of sequencing plates
Number of Reads
Number of Bases
7C126 (3 kb)
SC05 (3 kb)
Comparison of de novo assembly between 454 GS FLX, Sanger and Illumina platforms.
Number of Scaffolds
N50 Scaffold Size (kb)
Number of Contigs
N50 Contig Size (kb)
Largest Contig (kb)
Number of assembled bases (Mb)
Given the concern that pyrosequencing may be fallible in a highly (A+T) biased genome, we compared de novo assembly parameters for these 454-derived progeny reads with the parental genome sequence derived from standard dideoxy-based sequencing (Table 2). We demonstrate that the GS FLX performed surprisingly well with this technically challenging genome and the increased throughput of this system affords the increased fold coverage needed for downstream applications, including genotyping and allele discovery. This study demonstrates that the higher read depth and genome coverage generated by 454 technology substantially improves the quality (e.g. confident SNP calls) and efficiency of high through put marker discovery than can be obtained using microsatellite markers and microarray derived single feature polymorphisms (SFP).
We compared the results of the 454 assembly data of the progeny genomes to that of the 3D7 genome assembly generated by Illumina technology  to assess the performance differences between the two NGS technologies in a highly (A+T) rich genome. While the standard library preparation method in Illumina technology did not permit de novo assembly, the improved no-PCR method enabled de novo assembly ; this was comparable to the de novo assembly statistics obtained using 454 technology, at approximately 36× coverage with considerably fewer contigs than that with the Illumina modified no-PCR method. Furthermore, we aligned contigs larger than 1 kb to the 3D7 assembly (nuclear DNA - PlasmoDB v 5.4 and apicoplast/mitochondrial reference sequences ) to search for segments that may be mis-assembled/missing in the current reference genome. No substantial regions were found to be missing from the current genome assembly. Only 14 of our contigs remained unaligned (6 contigs in 7C126 = 6.2 kb, and 8 contigs in SC05 = 8.9 kb), one 16 kb contig from each progeny sequence appeared to be contaminating human mitochondrial DNA. The remaining 12 contigs were < 2 kb.
SNP detection and allele identification
To establish the platform for calling parental alleles inherited in the progeny clones, we developed a four step procedure: (1) map parental reads to the reference genome (3D7, PlasmoDB v 5.4, ); (2) identify SNPs between parents in these mapped regions; (3) map progeny reads to the reference genome; and (4) identify parental alleles in the progeny genomes (Figure 1B,C).
Mapping reads from progeny and parent genomes to the reference genome.
Method of Sequencing
Unique positions mapped3
In step 2, we established a set of 24,599 high quality SNP markers by requiring uniquely mapping reads, with no mixed base calls at any SNP position, an average quality score of ≥ 30, and at least 2 reads supporting the base call identity (Additional file 2). Relaxed mapping criteria will increase the total number of SNPs detected but with the cost of decreased specificity.
For steps 3 and 4, we used the 454 GS FLX progeny clone sequence data to assess allelic variation and recombination in the context of available parental genome sequences. In step 3, a total of 1,738,923 reads from 7C126 and 1,802,733 reads from SC05 were uniquely aligned to the reference genome (Table 3). These covered 21.6 Mb positions of the reference genome (92.8%) in 7C126 and 21.7 Mb positions (93.4%) in SC05.
In step 4, each of the progeny strains (7C126 and SC05) was genotyped at the candidate SNP loci that distinguished the parents with the added requirement of at least 10 reads supporting the base call. Although these requirements reduced the sensitivity in detecting SNPs, especially in low coverage regions, it increased the specificity of true SNP detection by lowering the likelihood of including false variants that arise due to sequencing and/or mapping errors. We analyzed only SNPs and excluded all indels and variants involving more than one nucleotide. In parallel to the work presented here, our lab developed a gene chip to resequence 45,000 SNPs cataloged in PlasmoDB  (M.T. Ferdig, unpublished). Of the 24,585 SNPs identified in this study, 2,468 were encoded on the gene chip and produced identical base calls in 2,431 (98.5%) between the two platforms for clone 7C126. While we cannot discern at this point how much each platform contributed to the small disparity, we conclude that our accuracy in SNP calls is ≥ 98.5%.
Summary of high quality mapped base calls and allelic SNPs in progeny genomes
Genome-wide detection of COs and NCOs
Crossover breakpoint resolution depends on the SNP allele density as well as their distribution across the breakpoint region and the length of the conversion tract accompanying the CO/NCO breakpoint (Figure 4). It also depends on the sequence coverage in the region of the tract. The CO breakpoints occurred in a median breakpoint window of 88.5 kb (7C126, Minimum = 5.6 kb) and 101.8 kb (SC05, Minimum = 0.7 kb). Simple breakpoints, where one parental allele transitioned smoothly into the other parental allele (Figure 4A), and complex CO breakpoints, accompanied by a conversion tract with frequent allele changes (Figure 4C), were identified (Additional file 4). Most COs are simple, while a few COs (7C126 = 9/27, SC05 = 8/25) are associated with complex tracts. Mancera et al.  describe complex patterns of genotype changes in both COs (11.5%) as well as NCOs (3.4%). Such complex tracts were also observed by Qi et al., and are consistent with the repair of heteroduplex DNA after Holliday junction formation and resolution [2, 30].
Single nucleotide base variants
The ability to distinguish true sequence variants from sequencing errors is a fundamental challenge in the discovery of SNP variants and genotyping efforts; thus it is important to understand the types and probabilities of error in base calls . Characteristic biases occur in sequence errors due to qualities of the queried base and the sequence context . Technical issues specific to 454 technology include: nucleotide calling difficulties within homopolymers; sequencing failure arising from incomplete homopolymer extension; base mis-incorporation by residual nucleotides during the nucleotide flow step; mixed template beads, and overrepresented single templates distributed across different beads . The haploid nature of the P. falciparum genome provides a unique opportunity to gain insight into systematic biases that may be introduced by 454 technology in an (A+T) rich genome. As a predominantly haploid organism, single nucleotide base variants differing in base composition from either of the parental sequence or heterozygous alleles for a given genome position is not expected. Two types of variant base calls were detected in the progeny; de novo SNPs and alternate SNP positions (i.e. multiple base calls per genome position).
De novo SNPs are defined here as bases in the progeny that are different from those of either parent. Eighty (0.0011%) and 128 (0.0017%) de novo SNPs were detected for 7C126 and SC05 respectively in high quality mapped base positions (i.e. single base call, ≥ 10 reads; average QS ≥ 30). The number of de novo SNPs detected is considerably lower than expected sequencing errors (estimated probability of an incorrect call = 0.1%) with a phred quality score of 30 .
Most de novo SNPs in the progeny (89% in 7C126 and 93% in SC05) occurred at positions that are identical in the parents. Eight loci were chosen and sequenced by traditional Sanger (capillary) methods in Dd2, HB3, SC05 and 7C126 (Additional file 5). For six of these, the progeny base was the same as a parent, indicating a 454 sequencing error. For the other 2 positions, the SNPs detected as de novo were concordant with a parent sequenced in our laboratory, indicating either differences between the clone used for original WGS and the clone of Dd2 strain used for resequencing in our lab, a sequencing error in the parental WGS sequence [8, 10], or a sequence alignment error. This implies an error rate of 0.07/10 kb in high quality single allele SNPs.
Local sequence context affects genome sequence coverage as well as base call errors in NGS platforms . To investigate the sequence characteristics connected with the de novo base calls, we searched for local sequence context, such as association with homopolymer tracts, because most 454 sequencing errors occur in homopolymer tracts [37, 38] of 5 or more . None of the positions analyzed were located in homopolymer tracts of > 5 bps (Additional file 6). Of all detected de novo base calls only 12 positions were found to be in homopolymer tracts of 3-5 bps (NNDN or NDNN, where N = nucleotide, and D = de novo SNP), indicating that these were not necessarily associated with homopolymer tracks; a majority (9/12) of the de novo SNPs detected in the homopolymer tracts were G (5/9) or C; suggesting these are unlikely to be sequencing errors arising due to homopolymer bias .
The probability of base substitutions occurring due to a sequencing error has been studied extensively . Substitutions caused by sequencing errors are approximately 1-2 errors per 10 kb on the 454 platform . Among the de novo SNPs, the proportion of transversion nucleotide substitutions (7C126 = 48% and SC05 = 56%) was greater than the proportion of transitions (7C126 = 46% and SC05 = 41%) in both progeny genomes. We observed a bias in the base changes (Additional file 7); T > A was the most common type of base conversion in both progeny. C > T, G > A transitions were frequent followed by transitions, A > G; T > C. The lack of SNP clustering coupled with the substitution biases may reflect Taq polymerase errors, and signal possible consequences of base mis-incorporation .
Both the parental genomes (derived from traditional Sanger based sequencing technology) and the two progeny genomes exhibited alternate SNP positions, i.e. multiple base calls at a single position. Of all positions uniquely mapped, the progeny clones sequenced in this study exhibited few positions 7C126 (0.1%) and SC05 (0.09%) with multiple base calls compared to the parental genomes. We observed no specific location bias for the genome-wide distribution of the alternate SNP positions in either parents or progeny genomes.
Alternate SNP positions can be expected in a haploid genome as a consequence of amplification artifact, sequencing errors, mis-mapped reads, or novel mutations occurring during in vitro culture. The primary base call of the alternate SNP positions was compared with the parental base calls at the position. The majority of the primary base calls were parental SNPs (approximately 80% of the allelic positions), while the majority of the secondary base calls were novel i.e. non parental SNPs (7C126 = 74%, SC05 = 86%). The novel primary base call position in 7C126 was primarily transitions, while it was primarily transversions in SC05. The uniformity of the distribution of alternate SNP positions and the type of substitutions observed in 7C126 suggested base conversion bias: C > T (shown in SC05 as well), G > A transitions were the most frequent followed by transitions, T > A, A > G, and T > C (Additional file 8); and suggest possible consequences of base mis-incorporation due to Taq polymerase errors . The technology relies on single bead, single template amplification. Therefore amplification artifacts are rare, relative to actual sequence differences. Some pyrosequencing errors are also reported due to base miscalls arising from of mixed-template beads . These alternate SNP positions could represent potential new mutations. Alternatively, in the case of a multi-clonal heterogenous population, there can be multiple independent high-quality reads with "normal" flowgrams which can give rise to alternate allelic positions. On the other hand, alternate allelic positions can also occur from paralogous sequences or repeats that are not present in the reference .
Copy number variant detection
Five known copy number differences detected between parental strains using WGS read depth.
No. of 2.5 kb windows
Significant windows (p < 0.001)
Highest Chi-square value
Deletion - in both
Amplification - 7C126
Deletion - in both
Amplification - 7C126
Amplification - SC05
Comparison of sequence data from the 454 GS FLX platform with genome sequence generated by conventional dideoxy-based sequencing demonstrates that the GS FLX data is favorably comparable to standard dideoxy-based sequencing for de novo assembly of an AT rich genome because the assembly statistics were similar to those of the parental genomes. The high throughput SNP marker detection method using 454 technology substantially improved the efficiency of allele discovery and crossover detection compared to traditional markers (i.e. MS and RFLPs) used in linkage analysis. By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, this investigation captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. Our approach for surveying recombination in this predominantly haploid genetic system allow for not only genome wide detection and fine scale analysis of recombination products but also reveal potential details on CO interference and double strand break resolution.
Parasites, DNA Extraction, and Microsatellite genotyping
Plasmodium falciparum strains 3D7, Dd2, HB3, 7C126 and SCO5 were thawed from genotyped source stocks and cultured at parasitemia suitable for DNA extraction. Parasites were grown at 37°C and 5% hematocrit in O+ human red blood cells using RPMI 1640 (Invitrogen, Carlsbad, CA) supplemented with 0.5% Albumax I (Invitrogen), 0.25% sodium bicarbonate (Mediatech, Inc., Manassas, VA) and 0.01 mg ml-1 gentamicin (Invitrogen) under an atmosphere of 90% nitrogen, 5% oxygen, and 5% carbon dioxide. Cultures were gassed every day, the media was changed every 2 days and parasitemia was maintained below a level of 5%. Total genomic DNA was isolated from frozen culture using standard phenol-chloroform extraction. Each parasite DNA was genotyped for a set of 8 microsatellite markers to ensure clonality and to confirm parasite identities.
Library production and shotgun sequencing
GS FLX Titanium shotgun libraries were made from genomic DNA according to the manufacturer's specifications at 454 Life Sciences (454 Life Sciences, Branford, CT). Briefly, sequencing was performed according to GS FLX standard protocols with the following modifications: due to the high (A+T) content of the P. falciparum genome, the concentration of thymidine in the sequencing reaction was increased to 1.4 times the recommended amount, and 150 cycles of sequencing were performed instead of the standard 100 cycles. Two GS FLX Titanium paired-end libraries (3 kb) were constructed and sequenced at 454 Life Sciences according to the manufacturer's specifications (454 Life Sciences, Branford, CT).
This SNP analysis scheme began with a comprehensive re-analysis of the trace reads of HB3 and Dd2 from the database  before attempting to identify SNPs. We obtained read and quality sequences for the Dd2 and HB3 strains from the NCBI Trace Archive in May 2009. The reads from the Dd2 and HB3 strains were computationally trimmed using LUCY  (parameters -error 0.05 0.50 -window 50 0.05 -bracket 10 0.10). The trimmed reads were aligned to the 3D7 reference assembly using SSAHA2 version 188.8.131.52 (-tags 1 -output cigar -diff 0 -identity 90.0 -best 1). The alignments were filtered using ssaha cigar with default parameters. Custom perl scripts were used to summarize the base call and quality information for all reads that map to each position of the reference genome. For each base call that occurs at a position, the coverage (number of reads) and quality scores are stored in a text file similar to the vertical multiple alignment (VMA) format . The base call with the most reads is considered the primary base call. If a second base is called with two or more supporting reads, then it is stored as a secondary base call.
The 454 reads from the 7C126 and SC05 strains were aligned to the 3D7 reference assembly using SSAHA2 (-tags 1 -output cigar -diff 0 -identity 90.0 -best 1). The alignments were filtered using ssaha_cigar with default parameters. The primary and secondary base calls and quality scores were summarized into VMA in a similar manner as the parental strains.
SNP calls and SNP verification
Parental SNP identification: The base calls and quality values of the sequence from the Dd2 and HB3 strains were considered at each position of the reference genome. We required each parent to have two or more reads with an average quality score of at least 30. Additionally, we required both parents to uniformly exhibit a single base at a position (no secondary base call). Positions that met all of these criteria were considered candidate positions for progeny genotyping, i.e. the Dd2 base call differs from the HB3 base call. The base calls and quality values of the 7C126 and SC05 strains were considered at each position determined above. We required a strain to have 10 or more read depth coverage with average quality score of at least 30. Additionally, to call an allele for a progeny clone, we required it to uniformly exhibit a single base at the position (no secondary base call). Positions that met all of these criteria were considered valid.
To further confirm the SNP calls, we compared the SNP calls on an independent platform. In parallel to the work presented here, our lab developed a gene chip to resequence 45,000 SNPs cataloged in PlasmoDB  (M.T. Ferdig, unpublished). Of the 24,585 SNPs identified in this study, 2,468 were encoded on the gene chip for direct comparison with 7C126 between the platforms.
At each valid position that we identified as a parental SNP, we classified each strain as inheriting the Dd2 or HB3 allele, or alternatively, a de novo allele. To validate the de novo SNPs detected, we PCR amplified eight 1 kb regions overlapping de novo SNP loci. Each amplicon was sequenced bi-directionally (forward and reverse) using standard dideoxy-based sequencing on an ABI 3730xl DNA Analyzer. Sequencing chromatograms were analyzed with Contig Express (Vector NTI Advance™ software, Life Technologies Corporation, Carlsbad, CA).
Recombination breakpoint prediction and verification
A sliding window approach was used for the prediction of recombination breakpoints . The filtered single allele calls were assessed in 15 SNP intervals. Allele frequencies for each bin were calculated. A CO was predicted when the allele frequency in a window transitioned from one allele type to another (100% allele frequency). NCOs were defined as a locus consisting of opposite allele configuration within a larger surrounding region; and predicted with strict criteria: must contain at least 3 contiguous SNPs in the opposite allele configuration of the surrounding locus and must also include 8 of such SNPs in a 15 SNP window. This method will miss smaller NCOs involving < 3 contiguous SNPs.
The distance between events was calculated as the distance between the beginning of the previous breakpoint window of an event and the beginning of the consecutive breakpoint window of the next event. The chromosomal alignments at CO and NCO regions were visualized using Integrative Genomics Viewer . Regions at CO and NCO regions were visually inspected in comparison with the parental genomes, for quality of read alignment and SNP distribution.
Analysis of single nucleotide base variants
Custom perl scripts were used to analyze single nucleotide base variants including de novo SNPs and alternate allelic positions. De novo SNPs were defined as called bases in the progeny that are different from those of either of the parents. All de novo SNPs were checked for association with homopolymer tracts of > 5 bps (NDNNNN or NNDNNN or NNNDNN or NNNNDN where N = nucleotide, and D = de novo SNP) and 3 to 5 bps (NNDN or NDNN). Alternate allelic positions were defined for parental genomes as well as the progeny genomes. The base with the most reads was considered the primary allele, while the alternate base was considered the secondary allele at that position. Two different sets of read cutoffs were used to differentiate the secondary allele in parent (at least 2 supporting reads) and progeny (at least 5 supporting reads). Both de novo SNPs and the primary allele in progeny were analyzed for base substitution changes in comparison with the parental base using custom perl scripts.
Large structural event detection
A custom 385 k NimbleGen array designed for the P. falciparum 3D7 reference genome (PlasmoDB , 2006) using the standard CGH probe design protocol  was used . The array comprises 385,585 probes semi-tiled across the genome at a 4 bp interval spacing with a minimum probe length of 45 bp, and a maximum length of 85 bp. Labeling and hybridization was carried out according to the standard NimbleGen CGH protocol . 7C126 and SC05 were hybridized with reference 3D7. DNA fragmentation, labeling, hybridization, washing, and scanning were carried out using the standard NimbleGen CGH protocol, at the Genomics Core Facility (University of Notre Dame, Notre Dame, IN). The microarrays were hybridized and washed in a NimbleGen Hybridization System 4 (NimbleGen Systems, Inc., Madison, WI). Images were acquired by using The NimbleGen MS 200 Microarray Scanner (NimbleGen Systems, Inc., Madison, WI) at a 5 μm resolution. Probe intensity values were extracted from scanned images using NimbleScan extraction software (NimbleGen Systems, Inc., Madison, WI). The Cy3 and Cy5 signal intensities were normalized according to standard Nimblegen protocol (http://www.nimblegen.com/products/lit/cgh_userguide_v6p0.pdf). The normalized values were used for calculation of log2 ratio values and used for CNV detection using a segmentation model based on a Gaussian framework .
CNV detection with read depth analysis
Five characterized copy number differences were used to test structural variation detection with 454 shotgun read library in the (A+T) biased genome. Read locations along chromosomes were derived from CIGAR alignments (see section iii Read Mapping) used for SNP discovery; reads were assigned to non-overlapping 2.5 kb intervals if at least 85% of its length aligned to that interval. To compute CNVs, we used a simple 2 × 2 Chi square test. We compared the proportion of reads in each non-overlapping 2.5 kb interval relative to all reads that mapped to all other intervals on the chromosome; and compared each window between each progeny. The resulting statistic was converted to a p-value based on a Chi square distribution with two degrees of freedom, but not corrected for multiple comparisons. This computational approach is more similar to array-based detection and digital expression (e.g., Man et al., 2000 ) than more traditional read depth approaches (e.g., Bailey et al., 2002 ) and was chosen to detect large (5 kb or larger) structural variation known to occur in the progeny genomes.
- GS FLX:
Genome Sequencer FLX
whole genome shotgun
next generation sequencing
double strand break
single nucleotide polymorphism
insertion or deletion
comparative genomic hybridization
copy number variant
vertical multiple alignment
National Center for Biotechnology Information.
We thank Dr. Thomas Wellems for providing the progeny clones, and Drs. David Severson and Frank Collins for initiating the collaboration with Roche. This analysis was funded by NIH grants AI055035 and AI071121 to MTF. We are grateful for technical support from the Genomics and Bioinformatics core facilities funded by the University of Notre Dame strategic research initiative.
- Ragoussis J: Genotyping Technologies for Genetic Research. Annual Review of Genomics and Human Genetics. 2009, 10 (1): 117-133. 10.1146/annurev-genom-082908-150116.View ArticlePubMedGoogle Scholar
- Mancera E, Bourgon R, Brozzi A, Huber W, Steinmetz LM: High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature. 2008, 454 (7203): 479-485. 10.1038/nature07135.View ArticlePubMedPubMed CentralGoogle Scholar
- West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA, Michelmore RW: High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Res. 2006, 16 (6): 787-795. 10.1101/gr.5011206.View ArticlePubMedPubMed CentralGoogle Scholar
- Singer T, Fan Y, Chang H, Zhu T, Hazen SP, Briggs SP: A High-Resolution Map of Arabidopsis Recombinant Inbred Lines by Whole-Genome Exon Array Hybridization. PLoS Genet. 2006, 2 (9): e144-10.1371/journal.pgen.0020144.View ArticlePubMedPubMed CentralGoogle Scholar
- McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE, Stokowski R, Ballinger DG, Frazer KA, Cox DR, Padhukasahasram B, Bustamante CD, Weigel D, Mackill DJ, Bruskiewich RM, Ratsch G, Buell CR, Leung H, Leach JE: Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc Natl Acad Sci USA. 2009, 106 (30): 12273-12278. 10.1073/pnas.0900992106.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, Guan J, Fan D, Weng Q, Huang T, Dong G, Sang T, Han B: High-throughput genotyping by whole-genome resequencing. Genome Res. 2009, 19 (6): 1068-1076. 10.1101/gr.089516.108.View ArticlePubMedPubMed CentralGoogle Scholar
- Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J: SNP detection for massively parallel whole-genome resequencing. Genome Res. 2009, 19 (6): 1124-1132. 10.1101/gr.088013.108.View ArticlePubMedPubMed CentralGoogle Scholar
- Jiang H, Yi M, Mu J, Zhang L, Ivens A, Klimczak LJ, Huyen Y, Stephens RM, Su XZ: Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray. BMC Genomics. 2008, 9: 398-10.1186/1471-2164-9-398.View ArticlePubMedPubMed CentralGoogle Scholar
- Neafsey DE, Schaffner SF, Volkman SK, Park D, Montgomery P, Milner DA, Lukens A, Rosen D, Daniels R, Houde N, Cortese JF, Tyndall E, Gates C, Stange-Thomann N, Sarr O, Ndiaye D, Ndir O, Mboup S, Ferreira MU, Moraes Sdo L, Dash AP, Chitnis CE, Wiegand RC, Hartl DL, Birren BW, Lander ES, Sabeti PC, Wirth DF: Genome-wide SNP genotyping highlights the role of natural selection in Plasmodium falciparum population divergence. Genome Biol. 2008, 9 (12): R171-10.1186/gb-2008-9-12-r171.View ArticlePubMedPubMed CentralGoogle Scholar
- Dharia NV, Sidhu AB, Cassera MB, Westenberger SJ, Bopp SE, Eastman RT, Plouffe D, Batalov S, Park DJ, Volkman SK, Wirth DF, Zhou Y, Fidock DA, Winzeler EA: Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum. Genome Biol. 2009, 10 (2): R21-10.1186/gb-2009-10-2-r21.View ArticlePubMedPubMed CentralGoogle Scholar
- Tan JC, Patel JJ, Tan A, Blain JC, Albert TJ, Lobo NF, Ferdig MT: Optimizing comparative genomic hybridization probes for genotyping and SNP detection in Plasmodium falciparum. Genomics. 2009, 93 (6): 543-50. 10.1016/j.ygeno.2009.02.007.View ArticlePubMedPubMed CentralGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419 (6906): 498-511. 10.1038/nature01097.View ArticlePubMedGoogle Scholar
- Mardis ER: The impact of next-generation sequencing technology on genetics. Trends Genet. 2008, 24 (3): 133-141.View ArticlePubMedGoogle Scholar
- Rothberg JM, Leamon JH: The development and impact of 454 sequencing. Nat Biotechnol. 2008, 26 (10): 1117-1124. 10.1038/nbt1485.View ArticlePubMedGoogle Scholar
- San Filippo J, Sung P, Klein H: Mechanism of eukaryotic homologous recombination. Annu Rev Biochem. 2008, 77: 229-257. 10.1146/annurev.biochem.77.061306.125255.View ArticlePubMedGoogle Scholar
- Chen JM, Cooper DN, Chuzhanova N, Ferec C, Patrinos GP: Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet. 2007, 8 (10): 762-775. 10.1038/nrg2193.View ArticlePubMedGoogle Scholar
- Barbazuk WB, Emrich S, Schnable PS: SNP Mining from Maize 454 EST Sequences. Cold Spring Harb Protoc. 2007Google Scholar
- van Orsouw NJ, Hogers RC, Janssen A, Yalcin F, Snoeijers S, Verstege E, Schneiders H, van der Poel H, van Oeveren J, Verstegen H, van Eijk MJ: Complexity reduction of polymorphic sequences (CRoPS): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS One. 2007, 2 (11): e1172-10.1371/journal.pone.0001172.View ArticlePubMedPubMed CentralGoogle Scholar
- Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR, Kirst M: High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9: 312-10.1186/1471-2164-9-312.View ArticlePubMedPubMed CentralGoogle Scholar
- Malhi RS, Sickler B, Lin D, Satkoski J, Tito RY, George D, Kanthaswamy S, Smith DG: MamuSNP: a resource for Rhesus Macaque (Macaca mulatta) genomics. PLoS One. 2007, 2 (5): e438-10.1371/journal.pone.0000438.View ArticlePubMedPubMed CentralGoogle Scholar
- Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M, Weinstock GM, Gibbs RA, Rothberg JM: The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008, 452 (7189): 872-876. 10.1038/nature06884.View ArticlePubMedGoogle Scholar
- Holt KE, Parkhill J, Mazzoni CJ, Roumagnac P, Weill FX, Goodhead I, Rance R, Baker S, Maskell DJ, Wain J, Dolecek C, Achtman M, Dougan G: High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet. 2008, 40 (8): 987-993. 10.1038/ng.195.View ArticlePubMedPubMed CentralGoogle Scholar
- Hyten DL, Cannon SB, Song Q, Weeks N, Fickus EW, Shoemaker RC, Specht JE, Farmer AD, May GD, Cregan PB: High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics. 2010, 11: 38-10.1186/1471-2164-11-38.View ArticlePubMedPubMed CentralGoogle Scholar
- Wellems TE, Panton LJ, Gluzman IY, do Rosario VE, Gwadz RW, Walker-Jonah A, Krogstad DJ: Chloroquine resistance not linked to mdr-like genes in a Plasmodium falciparum cross. Nature. 1990, 345 (6272): 253-255. 10.1038/345253a0.View ArticlePubMedGoogle Scholar
- Su X, Ferdig MT, Huang Y, Huynh CQ, Liu A, You J, Wootton JC, Wellems TE: A genetic map and recombination parameters of the human malaria parasite Plasmodium falciparum. Science. 1999, 286 (5443): 1351-1353. 10.1126/science.286.5443.1351.View ArticlePubMedGoogle Scholar
- Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ: Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009, 6 (4): 291-295. 10.1038/nmeth.1311.View ArticlePubMedPubMed CentralGoogle Scholar
- PlasmoDB: a functional genomic database for malaria parasites. [http://plasmodb.org/plasmo/]
- NCBI Trace Archive. [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?]
- Martinez-Perez E, Colaiácovo MP: Distribution of meiotic recombination events: talking to your neighbors. Curr Opin Genet Dev. 2009, 19 (2): 105-112. 10.1016/j.gde.2009.02.005.View ArticlePubMedPubMed CentralGoogle Scholar
- Qi J, Wijeratne AJ, Tomsho LP, Hu Y, Schuster SC, Ma H: Characterization of meiotic crossovers and gene conversion by whole-genome sequencing in Saccharomyces cerevisiae. BMC Genomics. 2009, 10: 475-10.1186/1471-2164-10-475.View ArticlePubMedPubMed CentralGoogle Scholar
- Shinohara M, Oh SD, Hunter N, Shinohara A: Crossover assurance and crossover interference are distinctly regulated by the ZMM proteins during yeast meiosis. Nat Genet. 2008, 40 (3): 299-309. 10.1038/ng.83.View ArticlePubMedGoogle Scholar
- Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F: A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010, 20 (2): 273-280. 10.1101/gr.096388.109.View ArticlePubMedPubMed CentralGoogle Scholar
- Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB: Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008, 18 (5): 763-770. 10.1101/gr.070227.107.View ArticlePubMedPubMed CentralGoogle Scholar
- Holt RA, Jones SJ: The new paradigm of flow cell sequencing. Genome Res. 2008, 18 (6): 839-846. 10.1101/gr.073262.107.View ArticlePubMedGoogle Scholar
- Voelkerding KV, Dames SA, Durtschi JD: Next-generation sequencing: from basic research to diagnostics. Clin Chem. 2009, 55 (4): 641-658. 10.1373/clinchem.2008.112789.View ArticlePubMedGoogle Scholar
- Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, Beeson KY, Schork NJ, Murray SS, Topol EJ, Levy S, Frazer KA: Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009, 10 (3): R32-10.1186/gb-2009-10-3-r32.View ArticlePubMedPubMed CentralGoogle Scholar
- Moore MJ, Dhingra A, Soltis PS, Shaw R, Farmerie WG, Folta KM, Soltis DE: Rapid and accurate pyrosequencing of angiosperm plastid genomes. BMC Plant Biol. 2006, 6: 17-10.1186/1471-2229-6-17.View ArticlePubMedPubMed CentralGoogle Scholar
- Wicker T, Schlagenhauf E, Graner A, Close TJ, Keller B, Stein N: 454 Sequencing Put to the Test using the Complex Genome of Barley. BMC Genomics. 2006, 7: 275-10.1186/1471-2164-7-275.View ArticlePubMedPubMed CentralGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMedPubMed CentralGoogle Scholar
- Campbell PJ, Pleasance ED, Stephens PJ, Dicks E, Rance R, Goodhead I, Follows GA, Green AR, Futreal PA, Stratton MR: Subclonal phylogenetic structures in cancer revealed by ultra-deep sequencing. Proc Natl Acad Sci USA. 2008, 105 (35): 13081-13086. 10.1073/pnas.0801523105.View ArticlePubMedPubMed CentralGoogle Scholar
- Quinlan AR, Stewart DA, Stromberg MP, Marth GT: Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods. 2008, 5 (2): 179-181. 10.1038/nmeth.1172.View ArticlePubMedGoogle Scholar
- Huang W, Marth G: EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res. 2008, 18 (9): 1538-1543. 10.1101/gr.076067.108.View ArticlePubMedPubMed CentralGoogle Scholar
- Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.View ArticlePubMedGoogle Scholar
- Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh YP, Hahn MW, Nista PM, Jones CD, Kern AD, Dewey CN, Pachter L, Myers E, Langley CH: Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 2007, 5 (11): e310-10.1371/journal.pbio.0050310.View ArticlePubMedPubMed CentralGoogle Scholar
- Integrative Genomics Viewer. [http://www.broadinstitute.org/]
- Selzer RR, Richmond TA, Pofahl NJ, Green RD, Eis PS, Nair P, Brothman AR, Stallings RL: Analysis of chromosome breakpoints in neuroblastoma at sub-kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer. 2005, 44 (3): 305-319. 10.1002/gcc.20243.View ArticlePubMedGoogle Scholar
- Picard F, Robin S, Lavielle M, Vaisse C, Daudin JJ: A statistical approach for array CGH data analysis. BMC Bioinformatics. 2005, 6: 27-10.1186/1471-2105-6-27.View ArticlePubMedPubMed CentralGoogle Scholar
- Man MZ, Wang X, Wang Y: POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics. 2000, 16 (11): 953-959. 10.1093/bioinformatics/16.11.953.View ArticlePubMedGoogle Scholar
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297 (5583): 1003-1007. 10.1126/science.1072047.View ArticlePubMedGoogle Scholar
- Volkman SKK, Sabeti PCC, Decaprio D, Neafsey DEE, Schaffner SFF, Milner DAA, Daily JPP, Sarr O, Ndiaye D, Ndir O, Mboup S, Duraisingh MTT, Lukens A, Derr A, Stange-Thomann N, Waggoner S, Onofrio R, Ziaugra L, Mauceli E, Gnerre S, Jaffe DBB, Zainoun J, Wiegand RCC, Birren BWW, Hartl DLL, Galagan JEE, Lander ESS, Wirth DFF: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2006, 39: 113-119. 10.1038/ng1930.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.