High-resolution genotyping and mapping of recombination and gene conversion in the protozoan Theileria parva using whole genome sequencing
© Henson et al.; licensee BioMed Central Ltd. 2012
Received: 11 February 2012
Accepted: 13 September 2012
Published: 23 September 2012
Theileria parva is a tick-borne protozoan parasite, which causes East Coast Fever, a disease of cattle in sub-Saharan Africa. Like Plasmodium falciparum, the parasite undergoes a transient diploid life-cycle stage in the gut of the arthropod vector, which involves an obligate sexual cycle. As assessed using low-resolution VNTR markers, the crossover (CO) rate in T. parva is relatively high and has been reported to vary across different regions of the genome; non-crossovers (NCOs) and CO-associated gene conversions have not yet been characterised due to the lack of informative markers. To examine all recombination events at high marker resolution, we sequenced the haploid genomes of two parental strains, and two recombinant clones derived from ticks fed on cattle that had been simultaneously co-infected with two different parasite isolates.
By comparing the genome sequences, we were able to genotype over 64 thousand SNP markers with an average spacing of 127 bp in the two progeny clones. Previously unrecognized COs in sub-telomeric regions were detected. About 50% of CO breakpoints were accompanied by gene conversion events. Such a high fraction of COs accompanied by gene conversions demonstrated the contributions of meiotic recombination to the diversity and evolutionary success of T. parva, as the process not only redistributed existing genetic variations, but also altered allelic frequencies. Compared to COs, NCOs were more frequently observed and more uniformly distributed across the genome. In both progeny clones, genomic regions with more SNP markers had a reduced frequency of COs or NCOs, suggesting that the sequence divergence between the parental strains was high enough to adversely affect recombination frequencies. Intra-species polymorphism analysis identified 81 loci as likely to be under selection in the sequenced genomes.
Using whole genome sequencing of two recombinant clones and their parents, we generated maps of COs, NCOs, and CO-associated gene conversion events for T. parva. The data comprises one of the highest-resolution genome-wide analyses of the multiple outcomes of meiotic recombination for this pathogen. The study also demonstrates the usefulness of high throughput sequencing typing for detailed analysis of recombination in organisms in which conventional genetic analysis is technically difficult.
The tick-transmitted protozoan Theileria parva is an apicomplexan parasite that is phylogenetically distantly related to the human malaria parasite Plasmodium falciparum. T. parva causes East Coast fever (ECF), a costly tick-borne disease that kills about 1 million cattle per year in eastern, central and southern Africa . ECF threatens some 25 million cattle in 11 countries and is now putting at risk a further 10 million animals in new regions, such as southern Sudan, where the parasite has recently been discovered to be endemic .
The life cycle of T. parva involves transmission of the sporozoite stage to cattle or buffalo in the saliva of feeding adult and nymphal Rhipicephalus appendiculatus ticks . Within the bovidae host, sporozoites invade lymphocytes and this ensuing multi-nucleate schizont stage immortalizes infected lymphocytes and divides in synchrony with them, ensuring that the parasite is transmitted to each daughter cell. A proportion of schizonts develop subsequently into merozoites followed by the piroplasm stage that is infective to the R. appendiculatus ticks. After ingestion by the tick, free piroplasms develop into micro- and macro-gametes that undergo syngamy to form the diploid zygotes [4, 5]. Zygotes undergo meiotic division accompanied by sexual recombination resulting in the development of the kinete stage that later invades tick salivary gland type III acinar cells. Sporozoites differentiate within these acini and become the infective stage again for mammalian hosts .
Vaccination against ECF is achieved by the live “infection and treatment” immunisation method that induces potent cytotoxic T cell responses against the schizont stage of the parasite . This immune protection is partially T. parva isolate-specific, dominated by responses targeting a limited range of antigens in the vaccinated animal and vulnerable to breakthrough infection upon challenge [7, 8]. For obligate sexually reproducing eukaryotic pathogens with a complex life cycle in arthropod vectors and mammalian hosts, a map of recombination events is essential for understanding parasite genome evolution. Such data provide insight into the mechanisms of immune evasion and host and parasite co-evolution. In-depth studies for T. parva are at an early stage due to lack of informative high-resolution markers. The approaches previously used to dissect T. parva parasite polymorphism and recombination events include monoclonal antibodies , gene probes binding to multi-copy loci , panels of micro- and mini-satellite markers [10, 11] and most recently the sequence polymorphism in antigens that are targets of CD8+ T cell responses . While useful to generate preliminary information, these techniques were unable to provide high resolution dissection of recombination events at the whole genome level.
Recent experimental analyses performed in T. parva based on a panel of Variable Number Tandem Repeat (VNTR) and PCR-Restriction Fragment Length Polymorphism (RFLP) markers [10, 13] indicate high frequencies of crossover events. An initial linkage map of T. parva has been constructed using 79 VNTR markers and 35 recombinant clones derived from ticks fed on cattle co-infected with Muguga and Marikebuni stocks . The average genome-wide crossover rate was calculated to be approximately three times higher than that in Plasmodium falciparum. In total 434 crossover (CO) events, ten recombination hot spots and 13 cold spots were identified .
Massively parallel DNA sequencing technologies are becoming affordable for genotyping and have been applied to map recombination events by sequencing progeny strains and their parents across a wide range of different taxa. These include humans [14, 15], rice , yeast [17, 18], Toxoplasma gondii and most recently P. falciparum. The elucidation of the T. parva (Muguga) genome sequence  provides a reference template for the generation of a genome-wide panel of single nucleotide polymorphism (SNP) markers for high-resolution genotyping.
We describe herein the whole genome sequencing of two recombinant clones resulting from co-infection experiments of cattle with different T. parva isolates. With the panel of high density SNP markers established via sequencing, we identified crossover and non-crossover events, as well as CO-associated gene conversions. Intra-species genome comparison identified a preliminary list of genes undergoing positive selection, which can be useful for the future systematic analysis of T. parva genes involved in co-evolution with its tick vector and its cattle and cape buffalo (Syncerus caffer) hosts.
Results and discussion
Whole genome sequencing, mapping and de novo assembly
Sequencing of T. parva parent and progeny strains using the Roche 454 GS FLX sequencer with Titanium chemistry
Number of reads
Number of bases
Mean length (nt)
Median length (nt)
Mode length (nt)
PE (8 Kb) 1
PE (8 Kb)
PE (11 Kb)
PE (3 Kb)
Comparison of reference assembly and de novo assembly of T. parva parent and progeny genomes
Number of scaffolds
N50 scaffolds (nt)
Number of contigs (nt)1
N50 contigs (nt)1
Assembled bases (nt)1
De novo 2
The 454 reads were also assembled de novo into scaffolds and contigs (Table 2). For all isolates except MugugaMarikebuni, the N50 contig length produced by the de novo assembly was higher compared to that of mapped assembly, indicating that de novo assembly was less fragmented. The total numbers of bases assembled were also significantly higher (P-value < 0.05) in all isolates for the de novo assembly compared to those in the mapped assemblies, irrespective of the difference in sequence coverage between isolates. This also suggested the presence of highly divergent sequences among the strains, from which the reads could not be mapped back to the reference T. parva Muguga sequence but could be assembled de novo.
Southern blot analysis of recombinant MugugaUganda using telomeric probes
Identification of SNP markers and parental alleles
Unmapped and unaligned regions in T. parva Muguga genome by pair-wise comparisons to the four genomes sequenced in this study
Unmapped bases 1
Unmapped and unaligned bases (% of unmapped bases)2
Number of genes with unmapped and unaligned bases
Total number of SNPs in T. parva parent and progeny genomes
Total number of SNPs
Number of SNPs inferred from reference assemblies
Number of SNPs inferred fromde novoassemblies
Detection of crossover events
The high marker density also allowed us to study CO associated gene conversions (Additional file 6: Table S1). In MugugaMarikebuni, eight (53%) of the CO breakpoints were simple breakpoints, where one parental allele changed into the other parental allele without any associated gene conversion. The other seven were complex breakpoints, associated with conversion tracts containing frequent allele changes. The length of the gene conversion tracks ranged from 7 bp to 1 Kb, with a median length of 277 bp (Figure 6A). In MugugaUganda, 11 (46%) were simple breakpoints and 13 (54%) were complex breakpoints. The median length of gene conversion tracks was 426 bp (with a size range from 17 bp to 11 Kb) (Figure 6A). In both progeny strains, the median length of the conversion tracks accompanying CO breakpoints was much shorter than the observed length of 2 Kb in yeast [17, 24].
The percentage of complex breakpoints (about 50%) was higher to those reported for P. falciparum recombinants (33%, ) and yeast (11.5%, ). Such complex gene conversion tracks associated with COs have been predicted to result from the resolution of a double Holliday junction because of multiple distinct hetero-duplexes in a single CO event . Mismatch repair alternating between conversion and restoration could also lead to formation of such complex gene conversion tracks . This high fraction of COs accompanied by gene conversions could contribute to the diversity and evolutionary success of T. parva, by not only redistributing existing genetic variations, but also by changing allelic frequencies.
In MugugaUganda, each chromosome had at least one CO event, which is consistent with the expectation of one obligate CO per pair of homologous chromosome . In MugugaMarikebuni, no CO events were detected in chromosome 4. Chromosomes without detectable COs were also observed amongst the 35 MugugaMarikebuni recombinant parasite clones typed with the VNTR markers (Additional file 5: Figure S1). This might be due to the fact that the obligate CO on chromosome 4 could have been formed on the sister chromatid not analyzed here.
Effects of detection methods on crossover identification
To investigate the effect of the method employed for crossover identification, two sliding window methods to determine COs were implemented and applied for both progeny strains. One method varied the number of SNPs (Additional file 7: Figure S2 A) and the other method varied the physical distance (Additional file 7: Figure S2 B). The number of SNPs (7, 11, 15, 19, and 23) selected for each window size has been used in a similar study . The physical distances (1 to 6 Kb) that were calculated based on the mean marker distance of 127 bp, roughly corresponded to the average distances between 7, 15, 23, 31, 39, and 47 SNPs, respectively.
In both methods, smaller window sizes identified more COs, and the extra COs tended to be smaller. For example, when the sliding window size was decreased to seven contiguous markers, one CO breakpoint of 1 Kb was detected on chromosome 4 at position 1.797 Mb in MugugaMarikebuni, which was not identified using larger window sizes. On chromosome 2 in MugugaMarikebuni, the 1 Kb window also detected two COs, which became undetectable when the window size increased. Counts and positions of major CO events remained the same for both methods, although the sensitivities of the two methods seemed to vary for small COs. For example, both the 15 SNPs and the 2 Kb distance windows scanned through the chromosomes with an average resolution of 2 Kb, the window of 2 Kb called smaller COs in both progeny strains on most chromosomes compared to the 15 SNP window. Also, the estimated sizes of the breakpoints obtained from these two methods were not always identical. For example, CO breakpoints on chromosome 1 at positions around 1.1 Mb in MugugaMarikebuni were detected by all window sizes. The breakpoint sizes calculated by the SNP method were too short to be visible on graphical plots, while those calculated by the method of physical distance were large enough to be visible in the plots of the same resolution (Additional file 7: Figure S2). On chromosome 2 in MugugaMarikebuni, the opposite was observed; here the breakpoints became visible with the SNP method but not with the physical distance scanning.
The differences between the two methods can largely be explained by the variation of SNP densities along the chromosomes. Windows of SNPs would scan through highly variable regions with shorter physical distance, while windows of physical distances would scan through such regions with higher number of markers. Consequently, in regions with higher marker density, the SNP method would detect smaller COs and in regions with lower marker density the distance method would annotate smaller COs. In our study, the distance method identified higher numbers of smaller COs than the SNP method, suggesting that regions of lower marker density were more abundant in our dataset than regions of higher marker density. We decided to use the SNP method using 15 markers for further analysis for the following practical reasons. First, the method has been applied in a similar study in P. falciparum therefore allowing a direct comparison of the results between the two species. Second, we wanted to minimize detection of small COs as our study design did not make such calls with high confidence. Ideally, COs would be events where nucleotide sequences change from parent one (p1) to parent two (p2) in one meiotic product and from p2 to p1 in another at similar positions. An event where such a change in nucleotide sequence occurs only in one meiotic product is a non-crossover (NCO) [17, 18]. However, due to the technical challenges involved in generating recombinant clones, in the current study only one out of four meiotic products was sequenced for each pair of parental clones. Consequently, we cannot exclude the possibility that some of the COs could be tracts of NCOs. For example, small COs found on chromosome 1 and chromosome 2 in MugugaMarikebuni (Figure 5, Additional file 6: Table S1) might actually be NCOs, considering the very low number of Muguga alleles observed on these two chromosomes (Figure 4).
Detection of non-crossover events
The high density of markers also allowed us to characterize non-crossover (NCO) gene conversions. Fifty-two and 56 NCO events were identified in MugugaMarikebuni and MugugaUganda, respectively (Figure 5). In both progeny clones, all chromosomes carried one or more NCOs. In total, we identified 15 COs and 52 NCOs for MugugaMarikebuni and 24 COs and 56 NCOs for MugugaUganda. This sum of COs and NCOs (67 and 90) was lower than the estimate of 140–170 double-strand breaks (DSBs) per meiosis in yeast, but higher than the 49 described in P. falciparum. Yeast and P. falciparum are not suitable comparators for T. parva as although the genome sizes of yeast and T. parva are similar (12 Mb vs. 8.3 Mb) yeast has more chromosomes (16 vs. 4). Additionally P. falciparum, which is pylogenetically closer to T. parva, has a larger genome (23.3 Mb vs. 8.3 Mb). The total recombination events we observe is likely to represent a minimal estimate of the true total. The stringent definition of NCO applied here may have missed conversion events involving less than three SNP markers, or those that may have resulted in restoration of the original alleles. NCO signatures containing a complex pattern of genotype changes were also not included in the totals. This includes several long and complex gene conversion signatures observed in sub-telomeric regions in both progeny clones (Figure 5). By analyzing only one meiotic product out of the four, theoretically half of the COs and two thirds of the NCOs would have been missed. We were not able to differentiate long NCOs (i.e. NCOs involving 15 markers or more) from COs either, although this would not affect the total. As discussed earlier, genotyping of all four meiotic products will be needed to address these issues .
Compared to the length of CO-associated gene conversion signatures, the length of NCO conversion tracks was not significantly different (Mann–Whitney Wilcoxon test, P-value = 0.3297, Figure 6A), which deviates from previous observations in yeast and humans [27, 28]. The median length of the NCO gene conversion track was 108 bp and 207 bp in MugugaMarikebuni and MugugaUganda, respectively. This is considerably shorter than estimates from Saccharomyces cerevisiae[17, 29] but similar to those in Homo sapiens. In MugugaMarikebuni, seven NCOs overlapped with the previously defined CO hot spots and eight with putative non-recombinogenic regions. In the MugugaUganda progeny clone considerably more NCOs overlapped with the known CO cold spots compared to hot spots (12 vs. 2). In both strains, NCOs were distributed rather uniformly across each chromosome, including the sub-telomeric regions. However, clusters of NCOs were also observed such as in the sub-telomeric region of chromosome 2.
Crossover and non-crossover interference
The distance between recombination events could reflect interference. In yeast CO-CO and CO-NCO interference has been reported [17, 18]. We analyzed the distributions of distances between CO-CO (Figure 6B), NCO-NCO (Figure 6C), and CO-NCO (Figure 6D) events. All three distributions were consistent with the Poisson distribution (Figure 6B, C and D), suggesting there was no measurable interference among the recombination events. However, more measurements would be required for a thorough study of interference among COs and NCOs.
Effects of marker density on detections of crossover and non-crossover
Genomic locations of recombination breakpoints
The location of breakpoints differed between COs and NCOs. Most of the NCO breakpoints (~65%) occurred within genes, while ~15% were either located in IGR or overlapped with the gene-IGR boundary (Additional file 8: Table S2). In contrast, a majority (70-80%) of the CO breakpoints overlapped with the gene-IGR boundaries, only five (33%) MugugaMarikebuni and three (13%) MugugaUganda CO breakpoints fell within genes. Most DSBs are known to occur in the promoter regions [31, 32]. Classical eukaryotic promoter elements are thought to be absent in apicomplexan species, and it appears that regulatory motifs located in the IGR are responsible for transcription regulation . Studies in yeast showed that > 84% of DSBs were located in promoter regions [17, 31, 32]. In comparison, a significantly smaller proportion (P-value = 0.0001, Fischer’s exact test) of CO and NCO breakpoints in MugugaMarikebuni - 47% (7) of COs, 3% (5) of NCOs - and MugugaUganda - 42% (10) of COs, 15% (8) of NCOs - overlapped with the putative promoter regions. Interestingly, of the breakpoints that overlapped with promoter regions, 25% (3) and 28% (5) in MugugaMarikebuni and MugugaUganda, respectively, contained four of the five most over-represented regulatory motifs in T. parva identified by Guo and Silva .
Correlation of recombination events with chromosome size
In the two recombinant clones analyzed, the number of COs and NCOs did not correlate with chromosome sizes (Additional file 9: Figure S6). There were differences in the distribution of COs among the four chromosomes in the two recombinant clones. For example, on chromosome 1 of MugugaMarikebuni, there was one CO per 1.3 Mb, while it was one per 0.3 Mb in MugugaUganda. Distributions on chromosomes 2 and 3 were similar in both progeny strains, with one CO per 0.5 Mb on chromosome 2 and 0.3 Mb on chromosome 3, respectively. Chromosome 4 showed the lowest number of COs in both clones. Analysis of more recombinants would be essential to substantiate these results and to identify any correlations, if they exist.
Intra-species polymorphism analysis
Although T. parva genes potentially under positive selection have been previously identified by inter-species polymorphism analysis , genome wide intra-species polymorphism analysis has not been performed systematically. We aligned 4076 T. parva Muguga mRNA sequences to genome drafts of Marikebuni, MugugaMarikebuni, MugugaUganda and Uganda. About 3600 mRNA sequences could be fully aligned and were subjected to Ka/Ks analysis (Additional file 10: Figure S3). The mean value of Ka/Ks ratio was about 0.7 (Additional file 11: Figure S4). In total, we identified 81 genes with Ka/Ks ratio above 1 among the four strains (Additional file 12: Table S4). Comparison of these genes to the eukaryotic orthologous groups (KOGs) did not find any functional category that was enriched within these genes potentially under positive selection. The BLAT alignment method of identifying gene homologues in the four strains limits our analysis to less diverged genes. Genes with highly divergent sequences were not subjected to Ka/Ks calculation as a full-length sequence alignment with their respective Muguga counterpart was not obtained. Amongst these diverse genes is the CTL antigen, Tp2  that was found to be under positive selection in cattle and buffalo isolates . Pain et al.  predicted a total of 3265 orthologous genes between T. parva and T. annulata genomes. Inter-species Ka/Ks analyses of these orthologous genes produced highest Ka/Ks values for predicted merozoite surface proteins and macroschizont proteins without predicted membrane retention motifs potentially secreted into the leukocyte cytosol. These proteins could be involved in immunological interplay between host and parasite. Amongst genes without KOG matches but with Ka/Ks ratio above 1 was TP02_0174, which has been annotated as a putative sporozoite surface protein. Its orthologue in T. annulata has been characterized as sporozoite and macro-schizont gene 2 (Spm2) expressed in sporozoites, macro-schizont-infected leucocyte and piroplasm stages. A monoclonal antibody specific for Spm2 blocked partial invasion of sporozoites to host cells indicating that this protein could play a role in protective immune responses .
Roche 454 GS FLX Titanium chemistry was used to sequence the 8.3 Mbp genomes of two haploid progeny clones from an experimental T. parva co-infection in cattle and two of the parent isolates. All four strains were genome typed by comparing the 454 reads with the T. parva Muguga reference genome, which represented the other parent of both recombinant clones. Our data demonstrate that the A-T rich genome of T. parva can be reliably sequenced and assembled with 454 technology. The panel of genome-wide SNPs established can be used for molecular epidemiology studies in the future, particularly to trace potential new alleles introduced in T. parva endemic regions by applying the live vaccination method of “infection and treatment” . Additionally, the list of genes with high Ka/Ks values by intra-species comparisons can be further explored for potential functional implications. The high resolution mapping of COs and NCOs on all four chromosomes showed patterns distinct from P. falciparum, yeast and humans. The high number of gene conversion events found to be associated with recombination between different T. parva genotypes was not previously known. This observation, together with the relatively high crossover frequency reported previously, indicates that meiotic recombination plays a very important role in generation and distribution of genetic diversity in T. parva population, which is essential for the evolutionary success of the pathogen. The high multiplicity of infection in both cattle and the ancestral wildlife host, cape buffalo, suggests that recombination processes may occur frequently under natural conditions, thus contribute to the high level of polymorphism observed in T. parva in the field.
Parasite parental and recombinant clones
This study describes the whole genome sequencing of two cloned T. parva genotypes that were originally described by Morzaria et al., , T. parva Marikebuni (clone stabilate 3292) and T. parva Uganda (clone stabilate 3645) and two recombinant parasite clones derived thereof . Additional details regarding the generation of the clonal recombinant parasites are provided herein. All protocols and experimental procedures relating to in vivo infections used to generate recombinant parasite clones and piroplasms for extraction of DNA for genome sequencing were first evaluated by the ILRI Animal Care and Use Committee (IACUC), which ensures that all animal experiments adhere to international standards.
Animal infection and recovery of recombinant parasite clones
Cattle were inoculated with two different T. parva stocks, either Muguga and Marikebuni or Muguga and Uganda. From these co-infected animals, tick pick up experiments were performed and sporozoites were isolated from these tick salivary glands. Acini containing putatively recombinant parasites were identified using PCR amplification with conserved Tpr locus primers, followed by hybridization with stock-specific oligonucleotide probes designed from the Tpr locus [40, 41]. The hybridization analysis indicated that 38% of these acini represented mixtures containing genetic material from both parents [5, 39]. Isolated sporozoites from single infected acini with mixed infections were then used to infect peripheral blood mononuclear cells in vitro at very high dilutions to maximise probability that the cells were infected with a single sporozoite. The resulting schizont transformed cell lines were then twice cloned by limiting dilution to arrive at a clonal progeny . The schizont-infected cell lines containing the putative recombinants were used to infect the autologous animals from which the peripheral blood mononuclear cells were obtained originally and the piroplasms were isolated. A detailed protocol describing the complete procedure for isolation of clonal recombinant parasites is provided in the Additional file 13: Supplementary material and methods.
Piroplasm purification, DNA preparation and southern blotting
Piroplasm in suspension was lyzed by mixing gently in the presence of 10mM Tris-Cl, pH7.4, 100mM EDTA, 50ug/ulRNase A and 0.5% SDS and incubated at room temperature (23°C) for 10 min. Then Proteinase K is added to the mixture at 100ug/ml, mixed and incubated at 56°C for 4 hours. NaCl is added to 100mM and the mixture is extracted with one volume water-saturated phenol-chloroform-isoamyl alcohol (25/24/1) by shaking gently overnight at 4°C. After centrifugation at 5,000 rpm at 4°C for 20 min, the upper aqueous phase containing the DNA is transferred in a fresh tube and 2 volume of absolute ethanol chilled at −20°C is gently layered on top of the aqueous phase and the DNA is precipitated by returning the tube gently to mix. The supernatant is decanted and the DNA pellet is washed trice with large amount of 70% ethanol, air-dried then re-dissolved in 10mM Tris-Cl, pH 7.4, 1mM EDTA (TE) at 4°C overnight. Southern blotting onto Hybond N filters, probe labeling and hybridization was performed exactly as described previously .
Genotyping by whole genome sequencing
Genomic DAN samples were fragmented and sequenced with Roche 454 Titanium chemistry according to manufacturer’s protocols. All 454 read files have been submitted to NCBI Sequence Read Archive (SRA055319). Single nucleotide polymorphisms (SNPs) in the two parent and two recombinant progeny isolates were first called by mapping 454 reads to the Muguga genome using the Roche 454 Software GS Reference Mapper (Version 2.5.3). The high confidence SNP calls made by the software were further filtered to include only SNPs with depth of more than six non-duplicate reads and allele frequency over 50%. SNPs were also called independently by aligning the de novo assembly of each strain with the Muguga genome using MUMmer, which were filtered using the following rules: 1) supported by at least six non-duplicate reads; 2) had consensus quality score of 30 or above; 3) flanking regions (15 bp up and downstream) had less than 20% of bases whose consensus quality scores were less than 30. Insertions and deletions were not considered. One of the shortcomings of the 454 technology is the high error rate in calling homopolymers longer than 8 bp, which could reflect as erroneous non-synonymous mutations arising from frameshifts. SNPs within homopolymers were filtered out. Finally, the SNP sets were combined and those that fell into repetitive regions of Muguga genome were excluded as the final filtering step.
To determine if a SNP was inherited from Marikebuni/Uganda, MugugaMarikebuni and MugugaUganda progeny SNPs were compared with those of Marikebuni and Uganda parents, respectively. To identify alleles inherited from Muguga in the two progeny strains, for each SNP locus in Marikebuni/Uganda, the consensus sequence of the corresponding progeny strain at the same location was checked. If the consensus sequence was the same as in Muguga genome, and was supported by at least three non-duplicate reads, a Muguga allele was called in the progeny strain at the given location.
Detection of recombination events
For identification of COs, the sliding window method of 7 to 23 contiguous SNP markers [16, 20] and the sliding window method of 1 to 6 Kb were implemented and compared. The sliding window of 15 contiguous SNP markers was selected for further analysis for the reasons explained in the Results section. For every marker locus, allele frequency was calculated using the sliding window. A CO breakpoint was defined as the region where the allele frequency transitioned from 100% of one parental allele type to 100% of the other parental allele type. NCO breakpoints were strictly defined as a region consisting of 3 to 14 contiguous markers with one parental allele type, flanked by longer stretches of markers with the other parental allele type. The distance between events was calculated as the distance between the start sites of two consecutive breakpoints. The sizes of breakpoints were maximal distances defined by the closest markers with defined allele frequency changes (Additional file 14: Figure S7, materials and methods), which can be over-estimations due to the limited SNP information .
Identification of promoter regions
Promoter region boundaries were determined based on a previous study that identified regulatory motifs in T. parva. They were positioned 300 bp upstream of the start position of genes in the reference Muguga strain . For intergenic regions (IGR) shorter than 300 bp the entire IGR was considered as a promoter region. Regions upstream of genes that overlapped with another gene were ignored. Genes on each strand were treated separately. CO and NCO breakpoints that overlapped with the putative promoter regions were identified using the fjoin program .
Intra-species polymorphism analysis
Full-length mRNA sequences from T. parva Muguga genome were aligned to both de novo and mapped assemblies using BLAT . For every pair of fully aligned mRNA sequences, the Ka/Ks ratio, i.e. the ratio of the number of non-synonymous substitutions per non-synonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks), was calculated using Nei and Gojobori approximate method  implemented in the Ka/Ks Calculator . Genes with Ka/Ks ratios above average and the corresponding Fisher exact test P-value below 0.05 were selected as genes potentially under positive selection. Gene sequences were compared to the eukaryotic orthologous groups (KOGs)  using Reversed Position Specific BLAST (RPS Blast)  for functional group classifications.
Double strand break
East coast fever
- GS FLX:
Genome sequencer FLX
Kilo base pairs
Mega base pairs
Single nucleotide polymorphism
Variable number tandem repeat.
We are grateful for technical support from the Functional Genomics Centre Zurich funded by the ETH Zurich and the University of Zurich. Sonal Henson was funded by a postdoctoral fellowship from the Research Fellow Partnership Programme for Agriculture, Forestry and Natural Resources of the North–south Centre, ETH Zurich. We thank the insightful comments from two anonymous reviewers, which helped us greatly to improve the manuscripts. We are very thankful that Damer Blake and Declan McKeever for providing us with the VNTR typing results of their MugugaMarikebuni recombinants.
- Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M: others: Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science. 2005, 309: 134-137. 10.1126/science.1110439.View ArticlePubMedGoogle Scholar
- Salih DA, El Hussein AM, Seitzer U, Ahmed JS, Salih DA, El Hussein AM, Seitzer U, Ahmed JS: Epidemiological studies on tick-borne diseases of cattle in Central Equatoria State, Southern Sudan. Parasitol Res. 2007, 101: 1035-1044. 10.1007/s00436-007-0583-y.View ArticlePubMedGoogle Scholar
- Norval RAI, Young A: Perry BD: The Epidemiology of Theileriosis in Africa. 1992, London: Academic Press LtdGoogle Scholar
- Gauer M, Mackenstedt U, Mehlhorn H, Schein E, Zapf F, Njenga E, Young A, Morzaria S: DNA measurements and ploidy determination of developmental stages in the life cycles of Theileria annulata and T. parva. Parasitol Res. 1995, 81: 565-574. 10.1007/BF00932023.View ArticlePubMedGoogle Scholar
- Bishop R, Geysen D, Skilton R, Odongo D, Nene V, Allsopp B, Mbogo S, Spooner P, Morzaria S: Genomic Polymorphism, Sexual Recombination and Molecular Epidemiology of Theileria parva. In Theileria. Edited by: McKeever DJ, Kluwer DD. 2002, The Netherlands: Academic, 23-40.View ArticleGoogle Scholar
- Goddeeris BM, Morrison WI, Teale AJ: Generation of bovine cytotoxic cell lines, specific for cells infected with the protozoan parasite Theileria parva and restricted by products of the major histocompatibility complex. Eur J Immunol. 1986, 16: 1243-1249. 10.1002/eji.1830161010.View ArticlePubMedGoogle Scholar
- Taracha EL, Goddeeris BM, Morzaria SP, Morrison WI: Parasite strain specificity of precursor cytotoxic T cells in individual animals correlates with cross-protection in cattle challenged with Theileria parva. Infect Immun. 1995, 63: 1258-1262.PubMed CentralPubMedGoogle Scholar
- Connelley TK, MacHugh ND, Pelle R, Weir W, Morrison WI: Escape from CD8+ T cell response by natural variants of an immunodominant epitope from Theileria parva is predominantly due to loss of TCR recognition. J Immunol. 2011, 187: 5910-5920. 10.4049/jimmunol.1102009.View ArticlePubMedGoogle Scholar
- Conrad PA, Ole-MoiYoi OK, Baldwin CL, Dolan TT, O’Callaghan CJ, Njamunggeh RE, Grootenhuis JG, Stagg DA, Leitch BL, Young AS: Characterization of buffalo-derived theilerial parasites with monoclonal antibodies and DNA probes. Parasitology. 1989, 98 (Pt 2): 179-188.View ArticlePubMedGoogle Scholar
- Katzer F, Lizundia R, Ngugi D, Blake D, McKeever D: Construction of a genetic map for Theileria parva: identification of hotspots of recombination. Int J Parasitol. 2011, 41: 669-675. 10.1016/j.ijpara.2011.01.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Oura CA, Odongo DO, Lubega GW, Spooner PR, Tait A, Bishop RP: A panel of microsatellite and minisatellite markers for the characterisation of field isolates of Theileria parva. Int J Parasitol. 2003, 33: 1641-1653. 10.1016/S0020-7519(03)00280-7.View ArticlePubMedGoogle Scholar
- Pelle R, Graham SP, Njahira MN, Osaso J, Saya RM, Odongo DO, Toye PG, Spooner PR, Musoke AJ, Mwangi DM, Taracha ELN, Morrison WI, Weir W, Silva JC, Bishop RP: Two Theileria parva CD8 T cell antigen genes are more variable in buffalo than cattle parasites, but differ in pattern of sequence diversity. PLoS One. 2011, 6: e19015-10.1371/journal.pone.0019015.PubMed CentralView ArticlePubMedGoogle Scholar
- Katzer F, Ngugi D, Walker AR, McKeever DJ: Genotypic diversity, a survival strategy for the apicomplexan parasite Theileria parva. Vet Parasitol. 2010, 167: 236-243. 10.1016/j.vetpar.2009.09.025.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee Y-S, Chao A, Chen C-H, Chou T, Wang S-YM, Wang T-H: Analysis of human meiotic recombination events with a parent-sibling tracing approach. BMC Genomics. 2011, 12: 434-10.1186/1471-2164-12-434.PubMed CentralView ArticlePubMedGoogle Scholar
- Ting JC, Roberson EDO, Currier DG, Pevsner J: Locations and patterns of meiotic recombination in two-generation pedigrees. BMC Med Genet. 2009, 10: 93-10.1186/1471-2350-10-93.PubMed CentralView ArticlePubMedGoogle Scholar
- Huang X, Feng Q, Qian Q, Zhao Q, Wang L, Wang A, Guan J, Fan D, Weng Q, Huang T, Dong G, Sang T, Han B: High-throughput genotyping by whole-genome resequencing. Genome Res. 2009, 19: 1068-1076. 10.1101/gr.089516.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Mancera E, Bourgon R, Brozzi A, Huber W, Steinmetz LM, et al: High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature. 2008, 454: 479-485. 10.1038/nature07135.PubMed CentralView ArticlePubMedGoogle Scholar
- Qi J, Wijeratne AJ, Tomsho LP, Hu Y, Schuster SC, Ma H: Characterization of meiotic crossovers and gene conversion by whole-genome sequencing in Saccharomyces cerevisiae. BMC Genomics. 2009, 10: 475-10.1186/1471-2164-10-475.PubMed CentralView ArticlePubMedGoogle Scholar
- Bontell IL, Hall N, Ashelford KE, Dubey JP, Boyle JP, Lindh J, Smith JE: Whole genome sequencing of a natural recombinant Toxoplasma gondii strain reveals chromosome sorting and local allelic variants. Genome Biol. 2009, 10: R53-10.1186/gb-2009-10-5-r53.PubMed CentralView ArticlePubMedGoogle Scholar
- Samarakoon U, Regier A, Tan A, Desany BA, Collins B, Tan JC, Emrich SJ, Ferdig MT: High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum. BMC Genomics. 2011, 12: 116-10.1186/1471-2164-12-116.PubMed CentralView ArticlePubMedGoogle Scholar
- Bishop R, Gobright E, Nene V, Morzaria S, Musoke A, Sohanpal B: Polymorphic open reading frames encoding secretory proteins are located less than 3 kilobases from Theileria parva telomeres. Mol Biochem Parasitol. 2000, 110: 359-371. 10.1016/S0166-6851(00)00291-7.View ArticlePubMedGoogle Scholar
- Hinterberg K, Mattei D, Wellems TE, Scherf A: Interchromosomal exchange of a large subtelomeric segment in a Plasmodium falciparum cross. EMBO J. 1994, 13: 4174-4180.PubMed CentralPubMedGoogle Scholar
- Pain A, Renauld H, Berriman M, Murphy L, Yeats CA, Weir W, Kerhornou A, Aslett M, Bishop R, Bouchier C, Cochet M, Coulson RMR, Cronin A, de Villiers EP, Fraser A, Fosker N, Gardner M, Goble A, Griffiths-Jones S, Harris DE, Katzer F, Larke N, Lord A, Maser P, McKellar S, Mooney P, Morton F, Nene V, O’Neil S, Price C, et al: Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science (New York, N.Y.). 2005, 309: 131-133. 10.1126/science.1110418.View ArticleGoogle Scholar
- Chen J-M, Cooper DN, Chuzhanova N, Férec C, Patrinos GP: Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet. 2007, 8: 762-775.View ArticlePubMedGoogle Scholar
- Whitby MC: Making crossovers during meiosis. Biochem Soc Trans. 2005, 33: 1451-1455.View ArticlePubMedGoogle Scholar
- Martinez-Perez E, Colaiácovo MP: Distribution of meiotic recombination events: talking to your neighbors. Curr Opin Genet Dev. 2009, 19: 105-112. 10.1016/j.gde.2009.02.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeffreys AJ, May CA: Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nat Genet. 2004, 36: 151-156. 10.1038/ng1287.View ArticlePubMedGoogle Scholar
- Terasawa M, Ogawa H, Tsukamoto Y, Shinohara M, Shirahige K, Kleckner N, Ogawa T: Meiotic recombination-related DNA synthesis and its implications for cross-over and non-cross-over recombinant formation. Proc Natl Acad Sci USA. 2007, 104: 5965-5970. 10.1073/pnas.0611490104.PubMed CentralView ArticlePubMedGoogle Scholar
- Borts RH, Haber JE: Length and distribution of meiotic gene conversion tracts and crossovers in Saccharomyces cerevisiae. Genetics. 1989, 123: 69-80.PubMed CentralPubMedGoogle Scholar
- Greig D, Travisano M, Louis EJ, Borts RH: A role for the mismatch repair system during incipient speciation in Saccharomyces. J Evol Biol. 2003, 16: 429-437. 10.1046/j.1420-9101.2003.00546.x.View ArticlePubMedGoogle Scholar
- Baudat F, Nicolas A: Clustering of meiotic double-strand breaks on yeast chromosome III. Proc Natl Acad Sci USA. 1997, 94: 5213-5218. 10.1073/pnas.94.10.5213.PubMed CentralView ArticlePubMedGoogle Scholar
- Gerton JL, DeRisi J, Shroff R, Lichten M, Brown PO, Petes TD: Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2000, 97: 11383-11390. 10.1073/pnas.97.21.11383.PubMed CentralView ArticlePubMedGoogle Scholar
- Guo X, Silva JC: Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva. BMC Genomics. 2008, 9: 582-10.1186/1471-2164-9-582.PubMed CentralView ArticlePubMedGoogle Scholar
- Weir W, Karagenç T, Baird M, Tait A, Shiels BR: Evolution and diversity of secretome genes in the apicomplexan parasite Theileria annulata. BMC Genomics. 2010, 11: 42-10.1186/1471-2164-11-42.PubMed CentralView ArticlePubMedGoogle Scholar
- Graham SP, Honda Y, Pellé R, Mwangi DM, Glew EJ, de Villiers EP, Shah T, Bishop R, van der Bruggen P, Nene V, Taracha ELN: A novel strategy for the identification of antigens that are recognised by bovine MHC class I restricted cytotoxic T cells in a protozoan infection using reverse vaccinology. Immunome Res. 2007, 3: 2-10.1186/1745-7580-3-2.PubMed CentralView ArticlePubMedGoogle Scholar
- Knight PA, Williamson SW, Brown CG, Bell-Sakyi L, Kirvar E, Kinnaird J, Hussain K, Hall R, Boulter N, Tait A: Identification of a Theileria annulata antigen expressed in multiple stages of the parasite life cycle. Exp Parasitol. 1998, 90: 110-121. 10.1006/expr.1998.4303.View ArticlePubMedGoogle Scholar
- Di Giulio G, Lynen G, Morzaria S, Oura C, Bishop R: Live immunization against East Coast fever–current status. Trends Parasitol. 2009, 25: 85-92. 10.1016/j.pt.2008.11.007.View ArticlePubMedGoogle Scholar
- Morzaria SP, Dolan TT, Norval RA, Bishop RP, Spooner PR: Generation and characterization of cloned Theileria parva parasites. Parasitology. 1995, 111 (Pt 1): 39-49.View ArticlePubMedGoogle Scholar
- Morzaria SP, Young JR, Spooner PR, Dolan TT, Bishop RP: Theileria parva: a restriction map and genetic recombination. Genome Analysis of Protozoan Parasites: Proceedings of a workshop Held at ILRAD, Nairobi, Kenya, 11-13 November, 1992. Edited by: Morzaria SP. 1993, Nairobi: The International Laboratory for Research on Animal Diseases, 67-73.Google Scholar
- Allsopp B, Carrington M, Baylis H, Sohal S, Dolan T, Iams K: Improved characterization of Theileria parva isolates using the polymerase chain reaction and oligonucleotide probes. Mol Biochem Parasitol. 1989, 35: 137-147. 10.1016/0166-6851(89)90116-3.View ArticlePubMedGoogle Scholar
- Bishop RP, Sohanpal BK, Allsopp BA, Spooner PR, Dolan TT, Morzaria SP: Detection of polymorphisms among Theileria parva stocks using repetitive, telomeric and ribosomal DNA probes and anti-schizont monoclonal antibodies. Parasitology. 1993, 31 (Pt 1): 19-31.View ArticleGoogle Scholar
- Richardson JE: fjoin: simple and efficient computation of feature overlaps. J Comput Biol. 2006, 13: 1457-1464. 10.1089/cmb.2006.13.1457.View ArticlePubMedGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMed CentralView ArticlePubMedGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Zhang Z, Li J, Zhao X-Q, Wang J, Wong GK-S, Yu J: KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinf. 2006, 4: 259-263. 10.1016/S1672-0229(07)60007-2.View ArticleGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH: CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2002, 30: 281-283. 10.1093/nar/30.1.281.PubMed CentralView ArticlePubMedGoogle Scholar
- Sohanpal BK, Morzaria SP, Gobright EI, Bishop RP: Characterisation of the telomeres at opposite ends of a 3 Mb Theileria parva chromosome. Nucleic Acids Res. 1995, 23: 1942-1947. 10.1093/nar/23.11.1942.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.