A novel multifunctional oligonucleotide microarray for Toxoplasma gondii
© Bahl et al; licensee BioMed Central Ltd. 2010
Received: 19 March 2010
Accepted: 25 October 2010
Published: 25 October 2010
Microarrays are invaluable tools for genome interrogation, SNP detection, and expression analysis, among other applications. Such broad capabilities would be of value to many pathogen research communities, although the development and use of genome-scale microarrays is often a costly undertaking. Therefore, effective methods for reducing unnecessary probes while maintaining or expanding functionality would be relevant to many investigators.
Taking advantage of available genome sequences and annotation for Toxoplasma gondii (a pathogenic parasite responsible for illness in immunocompromised individuals) and Plasmodium falciparum (a related parasite responsible for severe human malaria), we designed a single oligonucleotide microarray capable of supporting a wide range of applications at relatively low cost, including genome-wide expression profiling for Toxoplasma, and single-nucleotide polymorphism (SNP)-based genotyping of both T. gondii and P. falciparum. Expression profiling of the three clonotypic lineages dominating T. gondii populations in North America and Europe provides a first comprehensive view of the parasite transcriptome, revealing that ~49% of all annotated genes are expressed in parasite tachyzoites (the acutely lytic stage responsible for pathogenesis) and 26% of genes are differentially expressed among strains. A novel design utilizing few probes provided high confidence genotyping, used here to resolve recombination points in the clonal progeny of sexual crosses. Recent sequencing of additional T. gondii isolates identifies >620 K new SNPs, including ~11 K that intersect with expression profiling probes, yielding additional markers for genotyping studies, and further validating the utility of a combined expression profiling/genotyping array design. Additional applications facilitating SNP and transcript discovery, alternative statistical methods for quantifying gene expression, etc. are also pursued at pilot scale to inform future array designs.
In addition to providing an initial global view of the T. gondii transcriptome across major lineages and permitting detailed resolution of recombination points in a historical sexual cross, the multifunctional nature of this array also allowed opportunities to exploit probes for purposes beyond their intended use, enhancing analyses. This array is in widespread use by the T. gondii research community, and several aspects of the design strategy are likely to be useful for other pathogens.
In recent years, annotated genome sequences have become available for many important human and veterinary pathogens, facilitating the exploration of organismal biology. Genome-wide microarrays enable a variety of RNA- and DNA-based queries, contributing to our understanding of genome function and evolution [1, 2]. For example, a highly time-resolved expression profiling series through asexual blood stages of the human malaria parasite Plasmodium falciparum, using spotted oligonucleotide arrays, revealed a transcriptional program tightly coupled to the cell cycle , and further studies have elucidated responses to a variety of drug treatment regimens [4, 5]. Higher density photolithographic arrays provide greater resolution of the transcriptional landscape in P. falciparum, and have been used to assess genomic variation across multiple isolates [6, 7]. A newer generation of tiling arrays and 'next-generation' sequencing is expected to support further applications in gene and SNP discovery, expression profiling, etc. . Such studies have helped to drive research efforts in many areas, including the prioritization of targets for drug, vaccines and diagnostic development . Similar analyses would clearly be valuable for many pathogens, although the development and use of microarrays can be an expensive undertaking.
In order to address the diverse needs of the Toxoplasma gondii research community, we have developed a custom Affymetrix array for this protozoan parasite, a prominent source of neurological birth defects during congenital infection, and a cause of encephalitis in immunosuppressed patients. T. gondii provides an attractive organism for exploring the utility of mixed use microarrays, for several reasons. First, the parasite genome is relatively small (~65 Mb), and an annotated reference sequence is available [10, 11]. Second, a substantial collection of ESTs and SAGE tags from several strains and life cycle stages [12, 13] facilitates the assignment of ~8,000 gene models, and provides the basis for validating expression profiling studies. Third, ESTs from multiple strains permits identification of ~3,400 candidate SNPs , which have now been validated through additional genome sequencing data that became available in the course of the present study. Fourth, while sexual recombination plays a significant role in generating parasite diversity, including variation in virulence and other important phenotypes , T. gondii replicates as a haploid, greatly reducing the probe content required for genotyping. Finally, while all of the above characteristics apply to other pathogens as well (including Plasmodium spp.), excellent experimental systems are available for T. gondii permitting cell and molecular biological studies, forward and reverse genetics, and investigation of host-parasite interactions .
Taking advantage of these features, we have designed a novel multifunctional array which enables the following goals: global expression profiling of parasite genes (both nuclear and organellar), and simultaneous analysis of relevant host cell genes; genome-wide high-resolution genotyping; and pilot-scale studies for non-coding regions (promoters, introns, antisense RNAs), alternative expression metrics (exon-level profiling), validation of gene annotation, and polymorphism and transcript discovery. This array also supports inexpensive and efficient genotyping of malaria parasites, based on ~2 K SNPs distributed throughout the P. falciparum genome [17, 18]. Despite the multifunctional nature of the completed array, low cost and ease of experimental use were maintained, maximizing utility for the broader T. gondii and P. falciparum research communities.
We have utilized these arrays to provide the first global view of tachyzoite (lytic) stage gene expression for representatives of the three dominant T. gondii lineages found in Europe and North America [19, 20], greatly increasing our knowledge of gene expression differences  between clonotypes. Further, we describe methods for high-resolution genotyping of SNPs from T. gondii, enabled by complementing non-redundant genotyping probesets with individual expression profiling probes that intersect SNPs uncovered from recent sequencing of additional T. gondii isolates, validating the utility of a combined expression profiling/genotyping array design. Over 5,000 chosen SNPs are used to demonstrate high-resolution mapping of crossover points in the progeny of a historical sexual cross . Additionally, we provide data on select pilot-scale applications, including an exon-level analysis that generally supports the current (mainly computationally predicted) Toxoplasma gene models, and SNP discovery in the T. gondii plastid (apicoplast).
Application (for T. gondii unless otherwise indicated)
# of features
total # probes
% of chip
nuclear coding genes (3' biased)2
nuclear non-coding genes
apicoplast organellar genome (nt)
mitochondrial organellar genome (nt)
all exons (chr Ib only)3
all introns (chr Ib only)
antisense probes (opposite CDS; chr Ib only)
ESTs without predicted gene models (nt)
ORFs with BLASTX or TBLASTN hits (nt)
Expression Profiling (host species)
human (immune response & housekeeping)4
mouse (immune response & housekeeping)4
cat (housekeeping genes)
T. gondii genetic markers
SNPs inferred from T. gondii ESTs, etc
P. falciparum genetic markers
SFP discovery on 24 selected genes5
promoters (for ChIP) on 12 selected genes6
commonly used transgene reporters7
human & mouse normalization probes
yeast (housekeeping & spike-in probes)
mismatch probes (genes on chr 1b)
surrogate mismatch (background) probes
Probe design and selection required balancing space constraints on the array, a desire to employ standard well-supported experimental methods and analysis algorithms, and new opportunities afforded by custom design. Standard Affymetrix algorithms were used to select probes for traditional applications, including global parasite expression profiling, and genotyping of the several hundred well-characterized genetic markers previously reported for T. gondii. This allows for utilization of readily available protocols and software for labeling, hybridization, and analysis. For gene discovery and high-resolution genotyping applications, power analyses suggested that a lower degree of probe redundancy than commonly used in other systems would be sufficient for T. gondii and P. falciparum, which have relatively small genomes and replicate as haploids. Finally, pilot-scale projects were incorporated to generate preliminary data for several additional applications, including a comparison of methods for transcript profiling and analysis, examination of antisense and intron transcription, chromatin immunoprecipitation studies, expression of selected host genes, and polymorphism detection in highly variable genes.
Global Parasite Expression Profiling
Expression profiling of the ~8,000 genes identified in the parasite genome (reference strain ME49) is of general interest to the T. gondii research community, enabling the correlation of isolate-specific differences in gene expression with differences in virulence, drug sensitivity, differentiation, and other aspects of parasite biology [22, 23]. In order to facilitate such experiments, using commonly available reagents and analysis tools, we employed a standard gene expression profiling design, using eleven 3'-biased probes per gene . A perfect match only (PM-only) design was selected, as software supporting such designs is widely available, and exhibits comparable performance to mismatch corrected (PM-MM) schemes across a wide dynamic range [25, 26]. The accuracy of expression measures based on PM-only design was confirmed using exogenous spike-in controls, and by PM-MM analysis of genes on chromosome Ib (blue vs. gray in Additional File 1). In addition to profiling the nuclear genome, the mitochondrial and apicoplast genomes were tiled at 25 nt density on alternating strands (using the sequence from strain RH), allowing comprehensive expression analysis for these organellar genomes.
As indicated in Table 1, 7,793 T. gondii genes were annotated in the draft 3 nuclear genome sequence, and 3'-biased PM expression profiling probes were designed for all of these genes. In order to evaluate array performance, transcript abundance for in vitro- cultivated T. gondii tachyzoites was compared with information available from three alternative sources: (i) random cDNAs from large-scale unbiased EST sequencing projects [12, 27], (ii) cDNA abundance inferred by SAGE (serial analysis of gene expression) , and (iii) a microarray study using spotted clones corresponding to ~500 genes . Because none of these methods was carried out at sufficient depth to identify all transcription units, evaluated transcripts were binned into three groups based on expression level (see Methods). As shown in Additional File 2, this analysis shows good concordance between our array and each of the other three platforms, given our selected binning, over a dynamic range of >100-fold in transcript abundance, indicating reliable performance of the new array.
Gene Expression in Toxoplasma gondii.
Evidence for expression in tachyzoites:
Genes (total = 7793)
RH (type I)
Pru (type II)
VEG (type III)
SAGE tag studies
Spotted cDNA arrays
Biological replicates display extremely high concordance across the full range of expression, as shown in Figure 1A. The accompanying tables list genes exhibiting the most highly discordant hybridization patterns in pairwise between-strain comparisons (such queries may also be conducted at ToxoDB.org, using parameters specified by the user). Interestingly, these lists are highly enriched in rhoptry proteins, which are known to play important roles in parasite virulence and pathogenesis [30, 31]. Note, however, that many rhoptry proteins are also highly polymorphic, which may in some cases affect hybridization profiles, since expression probes on the array were based on the sequence of type II strain ME49 (asterisks in tables).
Extracting all genes exhibiting differential expression in any pairwise comparison at a P-value of 10-3 (adjusted for multiple testing) yields a total of 5,307 genes (68% of the genome). Further filtering to exclude genes that changed <2-fold, were unexpressed (at a 10% FDR), or were interrogated by a highly polymorphic probeset (defined as those having SNPs in ≥4/11 probes (an empirically determined threshold); see genotyping methods for a description of how polymorphic probes were identified), leaves 2,078 genes displaying statistically significant between-strain differences in expression (26% of all genes). Of these, a single outlier strain could be assigned for 1,239 genes; as indicated in Figure 1C, RH is the outlier in 23%, Pru in 27%, and VEG in 9% of this set. Down-regulation is much more common in Pru (P-value < 2e-16), while no statistical significance is detected with respect to direction of regulation in RH or VEG.
Strain-specific differential expression in Toxoplasma gondii.
Number of genes
Significant up- or down-regulation in:
Expressed in tachyzoites2
Any strain ( significance 3 )
RH (type I)
Pru (type II)
VEG (type III)
All genes in genome
GO-annotated genes 1 (process annotations only)
Gene-GO Slim mappings 1
177 0 .02
Other metabolic process
76 0 .04
Other biological process
Global High-Resolution Genotyping
In the field, T. gondii populations are characterized by a largely clonal structure, with most strains isolated from North America and Europe falling into one of three dominant clonotypes referred to as types I, II, and III [19, 20]. These clonotypes show low intra-lineage polymorphism, but inter-lineage polymorphism of ~1-2%. Variation is dominated by biallelic polymorphisms, and several hundred well-characterized RFLPs, microsatellites, and other markers have been used to map the genetic basis of lineage-specific phenotypes such as virulence [21, 30]. Genotyping by RFLP analysis is laborious, providing a bottleneck for mapping studies. We therefore incorporated probes for hybridization-based SNP genotyping onto the microarray, taking advantage of available space left over after the design of probes for expression profiling.
The final tier of genotyping design takes advantage of the fortuitous overlap between the 85,723 expression-profiling probes described above and 610,137 SNPs identified by whole genome alignment of three parasite genome sequences (type I strain GT1 and type III strain VEG, in addition to the type II reference strain ME49 (ToxoDB.org); Additional File 3). When these additional genomes became available (subsequent to chip design), it was determined that 10,903 expression profiling probes encompass SNPs. As 11 probes were used per gene for expression studies, and most analysis algorithms are quite robust with respect to individual outliers in the data, these polymorphisms have little impact on expression profiling (data not shown). Such single feature polymorphisms (SFPs; [36, 37]) provide a high density set of probes for interrogating polymorphisms, however, albeit at reduced confidence relative to the 4- and 40-probe designs, as they are based on a single probe for a single allele, and not necessarily centered on the SNP (lower vertical bars on chromosomes in Figure 2).
These three classes of SNP analysis probes were screened to remove probesets that failed to consistently yield correct calls across a training sample spanning all three lineages (Additional File 4, and Figure 2, inset). 141 (62%) of the SNPs analyzed using 40 probes (ten quartets) passed this screening (solid triangles in Figure 2). 1,600 (48%) of the SNPs analyzed using the 4 probe strategy (two pairs) passed screening, validating this more-efficient strategy for SNP detection, while 3,554 (33%) of SFP probes passed the filtering step. Note that the percentage of probesets retained is not a measure of accuracy, as excluded probesets usually make no call, rather than calling the incorrect allele. In aggregate, a total of 5,295 typable T. gondii genetic markers are represented on the array, and accuracy for those SNPs carried forward is >95%. This corresponds to an average density of 1 SNP per 12 kb genome-wide, representing an ~20-fold increase in resolution over prior genotyping efforts. Plasmodium falciparum SNPs were screened using a training set comprised of four strains (3D7, HB3, Dd2, and 7G8), yielding a total of 1,700 SNPs, confirming that the strategy of using two probes per allele is not sensitive to the extreme AT-bias of Plasmodium (>80%).
In a further illustration of the potential for multiplexing provided by multifunctional chip design, it is interesting to note the potential for using genetic marker probes that fall within coding regions for genotyping in the course of RNA hybridizations. As shown in Additional File 5, reliable calls can be made for ~30 highly expressed polymorphic coding sequence loci that lie close to 3' end of genes. While insufficient for high resolution genotyping, these data provide a useful, inexpensive, first-pass indication of probable genotype, helping to guard against inadvertent strain contamination.
SNP Discovery (pilot-scale)
Sequence differences distinguishing specific loci have historically been used to discern evolutionary relationships amongst Toxoplasma isolates . As an alternative to traditional sequencing, resequencing arrays provide a rapid means for base-calling using DNA hybridization signals. In typical resequencing arrays, a gene is tiled densely with probes, with each PM probe accompanied by the 3 possible MM probes allowing the correct sequence of the target DNA to be determined. Lower density tiling can also be informative (at far lower cost), through the identification of SFPs rather than specific sequence differences. Simulations using mouse resequencing data showed that high performance could be achieved by tiling PM probes only, at 2 bp density. A further (small) boost was observed by alternating the strand of adjacent probes (Additional File 6). As indicated in Table 1 (and Additional File 6, inset), 17 target genes were selected for tiling based on published and unpublished data indicating their utility for strain typing. Several introns were also tiled, in order to determine rates of neutral mutation. In addition, the entire apicoplast genome and a draft mitochondrial genome (assembled from shotgun sequence data and confirmed by PCR) were tiled at 25 bp resolution.
Exon-Level Analysis (pilot-scale)
Transcript Discovery (pilot-scale)
The current set of T. gondii gene annotations represent the results of an algorithm designed to detect consensus gene structures based on several ab initio and homology-based gene finders , and mapping of an extensive EST library, followed by limited manual curation . Although gene finding has improved in recent years, tiling arrays and deep sequencing have revealed that the level of transcription in most eukaryotic genomes often exceeds what is represented in existing annotation [45, 46]. In order to identify promising regions (genome-wide and on either strand) in which to search for unannotated genes, the entire reference T. gondii genome was filtered to identify:
(i) Unannotated sequences that map to consensus sequences derived by EST clustering (ApiDoTS clusters ) containing at least 3 ESTs, yielding 1,189 regions.
(ii) Unannotated ORFs with significant BLASTX hits to the non-redundant GenBank database , yielding 1,943 intergenic ORFs (≥150 nt) that overlap an HSP (bitscore ≥100) by more than 100 nucleotides.
(iii) Unannotated ORFs (≥150 nt) that are significant matches (overlap ≥100 nt, bitscore ≥200) for the query set of OrthoMCL ortholog database sequences using TBLASTN, resulting in 450 ORFs.
Statistical analysis of human spike-in data suggested that a tiling density of 35 bp would be sufficient to reliably detect missed exons across a useful dynamic range of target concentrations (Additional File 7). The 50 kb span displayed in Figure 6 shows moderate expression associated with an unannotated ORF (Ib_ORF1150) that hits a hypothetical protein in Plasmodium yoelli (PY00596), and two adjacent unannotated EST clusters exhibiting high expression.
Host Expression Profiling (pilot-scale)
Most of the intended applications for this microarray focus on the biology of T. gondii, but parasite pathogenesis clearly involves alterations in host cell/organism expression as well. Several key players involved in host adaptive and innate immune responses to T. gondii have been reported , and a small-scale transcriptional profiling study identified additional genes , but the complete host transcriptional profile during T. gondii infection is unknown. In order to permit evaluation of host immune responses, we included both human and the corresponding mouse orthologs (NCBI HomoloGene) for 260 host genes on the array, representing a comprehensive set of cytokines, chemokines, receptors, and other genes likely to function at the parasite-host interface (see Additional File 8 for a complete list). PM-only probesets were derived from the human U133Plus 2.0 and mouse 430 2.0 arrays [50, 51]. This collection provides parasitology researchers with an economical opportunity for studies that may not require genome-wide expression profiling, and the opportunity to explore expression changes in both host and pathogen in parallel. To permit unambiguous detection of signals from parasite vs. host mRNA, parasite gene expression probes were pruned to minimize the potential for cross-hybridization to human or mouse mRNA sequences, and preliminary analysis indicates essentially no reduction in specific signal when a 100-fold excess of host RNA was included in parasite expression profiling studies (data not shown).
This report describes a multifunctional microarray supporting a wide range of studies on the protozoan parasite Toxoplasma gondii (Table 1). Expression profiling confirms previous results obtained on various platforms (Additional File 2), allowing analysis to be extended genome-wide (Figure 1). The perfect match only design employed for this array compares favorably with a small-scale analysis including mismatch controls (for chromosome Ib only), and preliminary results indicate that small differences in sensitivity at low expression levels can be restored using a pool of surrogate mismatch probes selected on the basis of nucleotide composition (Additional File 1). Exon-level analysis (Figure 7), generally support the overall accuracy of T. gondii gene models. Tiling of regions with significant BLAST or EST hits, but no current gene call, allow the interrogation of additional transcriptionally active regions (Figure 6). All of the expression profiling data described in this report has been deposited with NCBI's Gene Expression Omnibus (GEO), and loaded into ToxoDB.org , enabling a wide-range of queries. For example, users may wish to compare genes identified by EST, proteomics, chromatin immunoprecipitation, and microarray analysis. The availability of whole genome expression profiling arrays is expected to facilitate a wide range of studies on stage-specific expression, mutant characterization, etc.
Comparative analysis of expression levels in representatives from each of the three lineages that dominate T. gondii populations in the US and Europe [15, 20] shows that ~49% of the 7,793 T. gondii genes identified in draft 3 annotation are expressed in tachyzoite-stage parasites (Table 2), implying that approximately half of the genome may function exclusively in the latent or sexual life stages. As demonstrated for tachyzoite transcriptional profiling, the microarray described in this paper can identify and prioritize genes that play key roles in these other life stages for further functional studies. For example, tachyzoite-to-bradyzoite stage transition experiments have yielded a robust set of genes that appear to be involved in early bradyzoite differentiation (Roos et. al., manuscript in preparation).
It is interesting to note the unusually low variance observed in biological replicates (Figure 1A), perhaps reflecting the homogeneity of the intracellular niche occupied by these parasites. These studies also reveal substantial differential expression between lineages (Table 3), with ~26% of expressed genes showing significant differences in at least one strain (although the high percentage of differentially expressed genes may simply reflect the low variance among biological replicates, which raises statistical power to detect subtle changes in expression levels). Secreted proteins known to play an important role in virulence [30, 31] are particularly notable for their extreme inter-strain differences in gene expression (Figure 1A). ToxoDB employs strain-specific library files (a mapping of probes to genes) that eliminate polymorphic probes to avoid false positives due to SNPs in determining differential expression.
The most unusual aspect of this study is the incorporation of both expression profiling and genotyping probes on the same array, broadening the utility of this chip for biological analysis. Standard array-based genotyping strategies (40 probes) were modified in light of the discovery that same-strand probes are largely redundant (Figure 3), particularly for haploid T. gondii parasites. Probesets including only 4 features passed high-stringency screening nearly as frequently as 40 feature probesets (62% vs. 48%), but are much more economical, improving T. gondii genotyping from the >300 kb resolution currently available using 186 RFLP markers  to ~37 kb (using 1,600 markers; Figure 2). 2,000 well-validated P. falciparum SNPs were also included on this array, providing an economical means for genotyping of the most lethal human malaria parasite [17, 18].
Sequences for two additional T. gondii isolates were released subsequent to chip production, revealing >600 K biallelic SNPs (Additional File 3). 33% of expression profiling probes that fortuitously overlap SNPs passed the strict quality control parameters established for genotyping, despite the absence of a complete probe quartet centered on the SNP, providing an additional ~3,500 reliable genotyping markers (Figure 2, inset). Using the entire set of 5,295 typable markers to evaluate the progeny of a cross previously analyzed by standard methods  revealed ~99% concordance, while mapping crossovers to higher resolution and identifying several additional recombination points (Figure 4). It will be interesting to investigate the several instances of apparent micro- and telomeric crossovers identified in this analysis. Higher resolution genotyping at lower cost should greatly facilitate QTL and other genetic mapping studies [21, 52].
Numerous other features were included on this multifunctional array (Table 1), including surrogate mismatch probes to facilitate background subtraction (Additional File 1), and probes for array-CGH studies on the tiled apicoplast genome. Apicoplast SFPs were used to demonstrate uniparental inheritance, with either parent able to provide the macrogamete (Figure 4, inset); transcript profiling (Figure 5) supports the proposed operon model for transcription of this organellar genome .
The driving motivation for this array design was to support low cost whole genome expression profiling for the protozoan parasite Toxoplasma gondii, by reducing standard chip size (in accordance with the relatively small parasite genome), and eliminating mismatch probes (which provide minimal advantage). This reagent has been widely adopted by the T. gondii research community. Excess space available on the array was exploited to support high resolution, low cost genotyping, taking advantage of the discovery that 4 feature probesets are nearly as effective as 40 feature probesets. The multifunctional nature of this array has provided many unexpected advantages, including the opportunities to exploit expression profiling probes as SFP markers, and the ability to use genotyping probes for strain validation during RNA hybridization experiments. Many of the principles employed in this design are applicable to other species.
Array Design and Production
T. gondii genome sequences (type II strain ME49) and gene models (draft III) were obtained from ToxoDB.org. The apicoplast genome sequence (type I strain RH) was from GenBank (acc# NC001799) and the mitochondrial genome sequence was inferred by alignment of sequence fragments (D. Shanmugam and L. Peixoto, unpublished data). P. falciparum SNPs were kindly provided by X. Su . A custom photolithographic microarray containing 25-mer oligonucleotides (11 micron feature size, 169 format) was designed and manufactured using the Affymetrix CustomExpress™ Array Program (Santa Clara, CA). Content is described in Table 1, and arrays are available through the Penn Microarray Facility. For further information, including custom analysis algorithms, library files, and ordering instructions, visit ToxoDB http://www.ToxoDB.org.
Expression Profiling Hybridizations and Data Analysis
For expression analysis based on the 3'-biased probesets, PrugniaudΔHXGPRT, RH, and VEG strain parasites were cultured in human foreskin fibroblast (HFF) cells as previously described . Prior to host cell rupture, cells were scraped from the flask and spun at 300 g for 9 min. The resultant pellet was lysed with Buffer RLT from the Qiagen RNeasy Mini Kit (Valencia, CA) and RNA was extracted according to the manufacturer's instructions. Labeled cRNA was created using the One-Cycle Labeling protocol in the Affymetrix GeneChip® IVT Labeling Kit (Santa Clara, CA). RNA used for exon-level analysis was isolated from Prugniaud strain parasites using the same procedures, but labeling was performed with the Affymetrix Whole Transcript Sense Target Labeling Kit according to manufacturer's instructions without rRNA reduction. Hybridization, washing, and scanning of arrays was performed using standard Affymetrix instrumentation and protocols for 11 micron, 169 format arrays. Biological triplicates were generated for each strain and expression values computed using the RMA implementation (default parameters) in the affy package from Bioconductor . Differential expression was determined using SAM at a 1% false discovery rate. The data from these 12 hybridizations have been deposited into GEO [GSE20145]. A more inclusive transcriptomic comparison set was subsequently generated from additional Toxoplasma strains, which serves as the data source for Figure 1A and are publicly available for querying and download from ToxoDB.org.
Comparison to SAGE, EST, and glass array data
38,263 unique 3-prime T. gondii SAGE tags  were associated with gene models if they mapped (exact 14-mer match) within a predicted CDS or the downstream 700 bp region (estimate of 3rd quartile of UTR-length distribution based on UniGene cluster analysis done for this study), resulting in 1,229 genes being linked with SAGE data. The 125,741 T. gondii ESTs deposited in dbEST (NCBI) were mapped to the reference genome using Splign  and EST-gene links made by filtering for EST coverage (≥80%) and extent of overlap (≥50 nt) with genes and their estimated 3'-UTR regions. This resulted in 2,336 genes being associated with EST data. Glass array data  for 2,449 sequenced cDNA spots was associated with genes using the same criteria as for ESTs, linking 501 genes with these hybridization intensities. In total, 3,077 genes (40% of the genome) were linked to at least one of these three sources of expression data. For comparison with parasite tachyzoite expression profiles derived from the Affymetrix array, tag counts from EST and SAGE data were filtered to include only unbiased (i.e. not normalized) tachyzoite stage libraries (SAGE: day6, MSJ, RH, and B7), and then normalized to tags per 100,000 (Tp100K = observed count * (100,000/total tachyzoite tags). The resulting tag values were binned to make expression level calls (low, medium, high) based on the following thresholds: low expression when ≤0.02% of cellular mRNA content (i.e. 0 < Tp100K <20); medium expression from 0.02 - 0.1% of mRNA content (20 ≤Tp100K < 100); high expression = ≥ 0.1% mRNA (Tp100K ≥ 100). Data from the glass arrays (tachyzoite controls from a differentiation experiment) was binned as follows: a gene was defined as exhibiting low expression level if hybridization was indistinguishable from background, medium if up to 2× background, and high if >2× background. See inset in Additional File 2 for relative numbers of genes exhibiting high, medium, and low level expression by each method.
Present/Absent Calls for Genes and Exons
For gene-level "present" calls, labeled sense-strand mRNA was used to define a null background RMA distribution for each gene using the 3'-biased probesets. P-values for presence were then assigned to each gene based on hybridizations with labeled antisense RNA. Using the Benjimini-Hochberg method these P-values were used to set a 10% FDR threshold. Exon-level presence calls were made using a similar procedure, with the null distribution of RMA values defined using antisense RNA, as exon probes are antisense in orientation to their corresponding mRNA. A 10% FDR was used for calling exon presence.
Simulations of Haploid Genotyping and SNP Discovery
DNA from each of the inbred mice DBA/2J and C57/B6 were hybridized to a 1 bp resequencing microarray designed for random regions of the C57/B6 mouse genome and 109 high confidence SNPs were identified using standard resequencing analysis algorithms (D. Kulp, unpublished results). Using this data as a reference for the detection of known SNPs, P-values associated with a Kolmogorov-Smirnov test distinguishing the two strains were computed using different probe tiling strategies.
Genome alignments and SFP Discovery
Genomic assemblies of GT1 (type I) and VEG (type III) were obtained from ToxoDB.org and aligned to the reference strain ME49 (type II) using NUCmer . Regions of ME49 with unambiguous mappings were scanned for SNPs using the show-snps program from the MUMmer package. 10,903 single feature polymorphisms (SFPs) were uncovered by searching for predicted SNPs that overlapped one of the 85,723 3-prime biased probes designed for expression profiling. Apicoplast SFPs were discovered using hybridization differences among apicoplast probes between GT1 (type I), Pru (type II), and CTG (type III) DNA hybridizations.
Genotyping Hybridizations and Data Analysis
Toxoplasma gondii genomic DNA was isolated from RH, Pru, VEG, and select recombinant progeny via scraping and pelleting cultured parasites (as above) and then using the Gentra Systems Generation DNA isolation kit (Minneapolis, MN) according to the manufacturer's instructions. Purified DNA was diluted in 750 ul TE pH 8.0 containing 10% glycerol with the addition of 2 ul molecular biology-grade glycogen (20 mg/ml). Approximately 800 ng of diluted DNA was added to an Invitrogen nucleic acid nebulizer on ice, and compressed nitrogen was used at 40 psi for 3 min to shear the DNA. Fragmented DNA was alcohol-precipitated, heated, and labeled for 2 hr using the Invitrogen BioPrime Array CGH Genomic Labeling module with biotin-14-dCTP according to the protocol from the manufacturer. Labeled DNA was cleaned with the Purification module and hybridized to the microarray as described above. Data analysis was conducted in Bioconductor, using custom R algorithms. Genetic markers were called using the Wilcoxon sign rank test using the 10 PM probe intensities for allele 1 versus the 10 PM probe intensities for allele 2 (P-value ≤.10). EST-based SNPs were called on the basis of the mean allelic ratio of the 2 pairs of PM probes (ratio ≥ 1.5). SFP calls were made based on the distances in the background corrected (RMA) and normalized (quantile) intensity value of a polymorphic probe in a progeny hybridization to its counterparts in parental hybridizations.
List of Abbreviations used
hypoxanthine-xanthine-guanine phosphoribosyl transferase
robust multi-array average
single feature polymorphism
single nucleotide polymorphism
This work was supported by the following NIH grants: AI077268 and RR016469 (PHD), AI072739 (MWW), HG003880 (DK), and AI028724 (DSR).
- Boothroyd JC, Blader I, Cleary M, Singh U: DNA microarrays in parasitology: strengths and limitations. Trends Parasitol. 2003, 19: 470-476. 10.1016/j.pt.2003.08.002.PubMedView ArticleGoogle Scholar
- Duncan RC, Salotra P, Goyal N, Akopyants NS, Beverley SM, Nakhasi HL: The application of gene expression microarray technology to kinetoplastid research. Curr Mol Med. 2004, 4: 611-621. 10.2174/1566524043360221.PubMedView ArticleGoogle Scholar
- Bozdech Z, Llinas M, Pulliam BL, Wong ED, Zhu J, DeRisi JL: The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003, 1: E5-10.1371/journal.pbio.0000005.PubMed CentralPubMedView ArticleGoogle Scholar
- Dharia NV, Sidhu AB, Cassera MB, Westenberger SJ, Bopp SE, Eastman RT, Plouffe D, Batalov S, Park DJ, Volkman SK, et al: Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum. Genome Biol. 2009, 10: R21-10.1186/gb-2009-10-2-r21.PubMed CentralPubMedView ArticleGoogle Scholar
- Ganesan K, Ponmee N, Jiang L, Fowble JW, White J, Kamchonwongpaisan S, Yuthavong Y, Wilairat P, Rathod PK: A genetically hard-wired metabolic transcriptome in Plasmodium falciparum fails to mount protective responses to lethal antifolates. PLoS Pathog. 2008, 4: e1000214-10.1371/journal.ppat.1000214.PubMed CentralPubMedView ArticleGoogle Scholar
- Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK, Haynes JD, De La Vega P, Holder AA, Batalov S, Carucci DJ, Winzeler EA: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science. 2003, 301: 1503-1508. 10.1126/science.1087025.PubMedView ArticleGoogle Scholar
- Kidgell C, Volkman SK, Daily J, Borevitz JO, Plouffe D, Zhou Y, Johnson JR, Le Roch K, Sarr O, Ndir O, et al: A systematic map of genetic variation in Plasmodium falciparum. PLoS Pathog. 2006, 2: e57-10.1371/journal.ppat.0020057.PubMed CentralPubMedView ArticleGoogle Scholar
- Su X, Hayton K, Wellems TE: Genetic linkage and association analyses for trait mapping in Plasmodium falciparum. Nat Rev Genet. 2007, 8: 497-506. 10.1038/nrg2126.PubMedView ArticleGoogle Scholar
- Tongren JE, Zavala F, Roos DS, Riley EM: Malaria vaccines: if at first you don't succeed. Trends Parasitol. 2004, 20: 604-610. 10.1016/j.pt.2004.09.005.PubMedView ArticleGoogle Scholar
- Kissinger JC, Gajria B, Li L, Paulsen IT, Roos DS: ToxoDB: accessing the Toxoplasma gondii genome. Nucleic Acids Res. 2003, 31: 234-236. 10.1093/nar/gkg072.PubMed CentralPubMedView ArticleGoogle Scholar
- Gajria B, Bahl A, Brestelli J, Dommer J, Fischer S, Gao X, Heiges M, Iodice J, Kissinger JC, Mackey AJ, et al: ToxoDB: an integrated Toxoplasma gondii database resource. Nucleic Acids Res. 2008, 36: D553-556. 10.1093/nar/gkm981.PubMed CentralPubMedView ArticleGoogle Scholar
- Ajioka JW, Boothroyd JC, Brunk BP, Hehl A, Hillier L, Manger ID, Marra M, Overton GC, Roos DS, Wan KL, et al: Gene discovery by EST sequencing in Toxoplasma gondii reveals sequences restricted to the Apicomplexa. Genome Res. 1998, 8: 18-28.PubMedGoogle Scholar
- Radke JR, Behnke MS, Mackey AJ, Radke JB, Roos DS, White MW: The transcriptome of Toxoplasma gondii. BMC Biol. 2005, 3: 26-10.1186/1741-7007-3-26.PubMed CentralPubMedView ArticleGoogle Scholar
- Boyle J, Rajasekar B, Saeij JPJ, Ajioka JW, Berriman M, Paulsen IT, Roos DS, Sibley LD, White M, Boothroyd JC: Just one cross appears capable of dramatically altering the population biology of a eukaryotic pathogen like Toxoplasma gondii. Proc Natl Acad Sci USA. 2006,Google Scholar
- Sibley LD, Boothroyd JC: Virulent strains of Toxoplasma gondii comprise a single clonal lineage. Nature. 1992, 359: 82-85. 10.1038/359082a0.PubMedView ArticleGoogle Scholar
- Roos DS, Donald RG, Morrissette NS, Moulton AL: Molecular tools for genetic dissection of the protozoan parasite Toxoplasma gondii. Methods Cell Biol. 1994, 45: 27-63. full_text.PubMedView ArticleGoogle Scholar
- Neafsey DE, Schaffner SF, Volkman SK, Park D, Montgomery P, Milner DA, Lukens A, Rosen D, Daniels R, Houde N, et al: Genome-wide SNP genotyping highlights the role of natural selection in Plasmodium falciparum population divergence. Genome Biol. 2008, 9: R171-10.1186/gb-2008-9-12-r171.PubMed CentralPubMedView ArticleGoogle Scholar
- Volkman SK, Sabeti PC, DeCaprio D, Neafsey DE, Schaffner SF, Milner DA, Daily JP, Sarr O, Ndiaye D, Ndir O, et al: A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007, 39: 113-119. 10.1038/ng1930.PubMedView ArticleGoogle Scholar
- Grigg ME, Bonnefoy S, Hehl AB, Suzuki Y, Boothroyd JC: Success and virulence in Toxoplasma as the result of sexual recombination between two distinct ancestries. Science. 2001, 294: 161-165. 10.1126/science.1061888.PubMedView ArticleGoogle Scholar
- Grigg ME, Suzuki Y: Sexual recombination and clonal evolution of virulence in Toxoplasma. Microbes Infect. 2003, 5: 685-690. 10.1016/S1286-4579(03)00088-1.PubMedView ArticleGoogle Scholar
- Su C, Howe DK, Dubey JP, Ajioka JW, Sibley LD: Identification of quantitative trait loci controlling acute virulence in Toxoplasma gondii. Proc Natl Acad Sci USA. 2002, 99: 10753-10758. 10.1073/pnas.172117099.PubMed CentralPubMedView ArticleGoogle Scholar
- Cleary MD, Singh U, Blader IJ, Brewer JL, Boothroyd JC: Toxoplasma gondii asexual development: identification of developmentally regulated genes and distinct patterns of gene expression. Eukaryot Cell. 2002, 1: 329-340. 10.1128/EC.1.3.329-340.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Matrajt M, Donald RG, Singh U, Roos DS: Identification and characterization of differentiation mutants in the protozoan parasite Toxoplasma gondii. Mol Microbiol. 2002, 44: 735-747. 10.1046/j.1365-2958.2002.02904.x.PubMedView ArticleGoogle Scholar
- Statistical Algorithms Description Document. [http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf]
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.PubMed CentralPubMedView ArticleGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098.PubMed CentralPubMedView ArticleGoogle Scholar
- Li L, Brunk BP, Kissinger JC, Pape D, Tang K, Cole RH, Martin J, Wylie T, Dante M, Fogarty SJ, et al: Gene discovery in the apicomplexa as revealed by EST sequencing and assembly of a comparative gene database. Genome Res. 2003, 13: 443-454. 10.1101/gr.693203.PubMed CentralPubMedView ArticleGoogle Scholar
- Weiss LM, Fiser A, Angeletti RH, Kim K: Toxoplasma gondii proteomics. Expert Rev Proteomics. 2009, 6: 303-313. 10.1586/epr.09.16.PubMed CentralPubMedView ArticleGoogle Scholar
- Xia D, Sanderson SJ, Jones AR, Prieto JH, Yates JR, Bromley E, Tomley FM, Lal K, Sinden RE, Brunk BP, et al: The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation. Genome Biol. 2008, 9: R116-10.1186/gb-2008-9-7-r116.PubMed CentralPubMedView ArticleGoogle Scholar
- Taylor S, Barragan A, Su C, Fux B, Fentress SJ, Tang K, Beatty WL, Hajj HE, Jerome M, Behnke MS, et al: A secreted serine-threonine kinase determines virulence in the eukaryotic pathogen Toxoplasma gondii. Science. 2006, 314: 1776-1780. 10.1126/science.1133643.PubMedView ArticleGoogle Scholar
- Saeij JP, Boyle JP, Coller S, Taylor S, Sibley LD, Brooke-Powell ET, Ajioka JW, Boothroyd JC: Polymorphic secreted kinases are key virulence factors in toxoplasmosis. Science. 2006, 314: 1780-1783. 10.1126/science.1133690.PubMed CentralPubMedView ArticleGoogle Scholar
- Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, et al: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods. 2004, 1: 109-111. 10.1038/nmeth718.PubMedView ArticleGoogle Scholar
- Jiang H, Yi M, Mu J, Zhang L, Ivens A, Klimczak LJ, Huyen Y, Stephens RM, Su XZ: Detection of genome-wide polymorphisms in the AT-rich Plasmodium falciparum genome using a high-density microarray. BMC Genomics. 2008, 9: 398-10.1186/1471-2164-9-398.PubMed CentralPubMedView ArticleGoogle Scholar
- Smemo S, Borevitz JO: Redundancy in genotyping arrays. PLoS ONE. 2007, 2: e287-10.1371/journal.pone.0000287.PubMed CentralPubMedView ArticleGoogle Scholar
- Mu J, Awadalla P, Duan J, McGee KM, Keebler J, Seydel K, McVean GA, Su XZ: Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome. Nat Genet. 2007, 39: 126-130. 10.1038/ng1924.PubMedView ArticleGoogle Scholar
- Borevitz JO, Liang D, Plouffe D, Chang HS, Zhu T, Weigel D, Berry CC, Winzeler E, Chory J: Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003, 13: 513-523. 10.1101/gr.541303.PubMed CentralPubMedView ArticleGoogle Scholar
- Winzeler EA, Richards DR, Conway AR, Goldstein AL, Kalman S, McCullough MJ, McCusker JH, Stevens DA, Wodicka L, Lockhart DJ, Davis RW: Direct allelic variation scanning of the yeast genome. Science. 1998, 281: 1194-1197. 10.1126/science.281.5380.1194.PubMedView ArticleGoogle Scholar
- Khan A, Bohme U, Kelly KA, Adlem E, Brooks K, Simmonds M, Mungall K, Quail MA, Arrowsmith C, Chillingworth T, et al: Common inheritance of chromosome Ia associated with clonal expansion of Toxoplasma gondii. Genome Res. 2006, 16: 1119-1125. 10.1101/gr.5318106.PubMed CentralPubMedView ArticleGoogle Scholar
- Genechip Exon Array System. [http://www.affymetrix.com/support/technical/datasheets/exon_arraydesign_datasheet.pdf]
- Gissot M, Kelly KA, Ajioka JW, Greally JM, Kim K: Epigenomic modifications predict active promoters and gene structure in Toxoplasma gondii. PLoS Pathog. 2007, 3: e77-10.1371/journal.ppat.0030077.PubMed CentralPubMedView ArticleGoogle Scholar
- Chaudhary K, Donald RG, Nishi M, Carter D, Ullman B, Roos DS: Differential localization of alternatively spliced hypoxanthine-xanthine-guanine phosphoribosyltransferase isoforms in Toxoplasma gondii. J Biol Chem. 2005, 280: 22053-22059. 10.1074/jbc.M503178200.PubMedView ArticleGoogle Scholar
- Delbac F, Sanger A, Neuhaus EM, Stratmann R, Ajioka JW, Toursel C, Herm-Gotz A, Tomavo S, Soldati T, Soldati D: Toxoplasma gondii myosins B/C: one gene, two tails, two localizations, and a role in parasite division. J Cell Biol. 2001, 155: 613-623. 10.1083/jcb.200012116.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu Q, Mackey AJ, Roos DS, Pereira FC: Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics. 2008, 24: 597-605. 10.1093/bioinformatics/btn004.PubMedView ArticleGoogle Scholar
- Mackey A, Liu Q, Pereira F, Roos D: GLEAN - Improved eukaryotic gene prediction by statistical consensus of gene evidence. Genome Informatics. 2005Google Scholar
- Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR: Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002, 296: 916-919. 10.1126/science.1068597.PubMedView ArticleGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, et al: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Denkers EY, Butcher BA, Del Rio L, Kim L: Manipulation of mitogen-activated protein kinase/nuclear factor-kappaB-signaling cascades during intracellular Toxoplasma gondii infection. Immunol Rev. 2004, 201: 191-205. 10.1111/j.0105-2896.2004.00180.x.PubMedView ArticleGoogle Scholar
- Blader IJ, Manger ID, Boothroyd JC: Microarray analysis reveals previously unknown changes in Toxoplasma gondii-infected human cells. J Biol Chem. 2001, 276: 24223-24231. 10.1074/jbc.M100951200.PubMedView ArticleGoogle Scholar
- GeneChip Human Genome Arrays. [http://www.affymetrix.com/support/technical/datasheets/human_datasheet.pdf]
- GeneChip Mouse Genome Arrays. [http://www.affymetrix.com/support/technical/datasheets/mogarrays_datasheet.pdf]
- Khan A, Taylor S, Su C, Mackey AJ, Boyle J, Cole R, Glover D, Tang K, Paulsen IT, Berriman M, et al: Composite genome map and recombination parameters derived from three archetypal lineages of Toxoplasma gondii. Nucleic Acids Res. 2005, 33: 2980-2992. 10.1093/nar/gki604.PubMed CentralPubMedView ArticleGoogle Scholar
- Gardner MJ, Williamson DH, Wilson RJ: A circular DNA in malaria parasites encodes an RNA polymerase like that of prokaryotes and chloroplasts. Mol Biochem Parasitol. 1991, 44: 115-123. 10.1016/0166-6851(91)90227-W.PubMedView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMed CentralPubMedView ArticleGoogle Scholar
- Kapustin Y, Souvorov A, Tatusova T: Splign - a Hybrid Approach To Spliced Alignments. RECOMB 2004 - Currents in Computational Molecular Biology. 2004, 741-Google Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol. 2004, 5: R12-10.1186/gb-2004-5-2-r12.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.