- Research article
- Open Access
Comparative genomic analysis reveals occurrence of genetic recombination in virulent Cryptosporidium hominis subtypes and telomeric gene duplications in Cryptosporidium parvum
- Yaqiong Guo†1, 2,
- Kevin Tang†3,
- Lori A Rowe3,
- Na Li1,
- Dawn M Roellig2,
- Kristine Knipe3,
- Michael Frace3,
- Chunfu Yang4,
- Yaoyu Feng1Email author and
- Lihua Xiao2Email author
© Guo et al.; licensee BioMed Central. 2015
- Received: 17 October 2014
- Accepted: 10 April 2015
- Published: 18 April 2015
Cryptosporidium hominis is a dominant species for human cryptosporidiosis. Within the species, IbA10G2 is the most virulent subtype responsible for all C. hominis–associated outbreaks in Europe and Australia, and is a dominant outbreak subtype in the United States. In recent yearsIaA28R4 is becoming a major new subtype in the United States. In this study, we sequenced the genomes of two field specimens from each of the two subtypes and conducted a comparative genomic analysis of the obtained sequences with those from the only fully sequenced Cryptosporidium parvum genome.
Altogether, 8.59-9.05 Mb of Cryptosporidium sequences in 45–767 assembled contigs were obtained from the four specimens, representing 94.36-99.47% coverage of the expected genome. These genomes had complete synteny in gene organization and 96.86-97.0% and 99.72-99.83% nucleotide sequence similarities to the published genomes of C. parvum and C. hominis, respectively. Several major insertions and deletions were seen between C. hominis and C. parvum genomes, involving mostly members of multicopy gene families near telomeres. The four C. hominis genomes were highly similar to each other and divergent from the reference IaA25R3 genome in some highly polymorphic regions. Major sequence differences among the four specimens sequenced in this study were in the 5′ and 3′ ends of chromosome 6 and the gp60 region, largely the result of genetic recombination.
The sequence similarity among specimens of the two dominant outbreak subtypes and genetic recombination in chromosome 6, especially around the putative virulence determinant gp60 region, suggest that genetic recombination plays a potential role in the emergence of hyper-transmissible C. hominis subtypes. The high sequence conservation between C. parvum and C. hominis genomes and significant differences in copy numbers of MEDLE family secreted proteins and insulinase-like proteases indicate that telomeric gene duplications could potentially contribute to host expansion in C. parvum.
- Whole genome sequencing
- Genetic recombination
Cryptosporidium spp. inhabit the brush borders of the gastrointestinal and respiratory epithelium of various vertebrates, causing enterocolitis, diarrhea, and cholangiopathy in humans . Immunocompetent children and adults with cryptosporidiosis usually have a short-term illness accompanied by watery diarrhea, nausea, vomiting, and weight loss. In immunocompromised persons, however, the infection can be protracted and life-threatening . Cryptosporidiosis is one of the most important causes of moderate-to-severe diarrhea and diarrhea-associated deaths in children in developing countries  and a major cause for waterborne and foodborne outbreaks of human illness in industrialized nations [4,5]. In the United States the number of reported cases of cryptosporidiosis has increased more than twofold since 2005 [6-9]. Currently, it is estimated that there are approximately 750,000 annual cases of cryptosporidiosis in the United States .
Among the many established Cryptosporidium species and genotypes, C. hominis and C. parvum are the two responsible for greater than 90% of the human cryptosporidiosis cases in most countries. C. hominis is largely human-specific and responsible for anthroponotic transmission of cryptosporidiosis. C. parvum infects both humans and some farm animals, especially pre-weaned calves and lambs and thus can be transmitted both anthroponotically and zoonotically . Within C. hominis, subtype IbA10G2 is the dominant strain for C. hominis-associated waterborne outbreaks of cryptosporidiosis in the United States, Europe, and Australia [10-16]. The dominant subtype associated with waterborne cryptosporidiosis outbreaks in the United States since 2005 is a new subtype, IaA28R4 [17-20].
Whole genome sequencing of Cryptosporidium spp. has greatly facilitated the development of genotyping, subtyping and multilocus sequence typing (MLST) tools for characterizing the transmission of C. hominis and C. parvum [21,22]. These tools have played a major role in improving our understanding of cryptosporidiosis epidemiology [10,23]. Nevertheless, genomic studies of Cryptosporidium spp. lag far behind those on other related apicomplexan parasites largely because of the lack of effective cultivation and animal models. Thus far, only the genomes of one laboratory isolate each of C. parvum, C. hominis, and C. muris have been sequenced using traditional Sanger sequencing technology [22,24,25]. More recently, the genome of an anthroponotic II subtype (IIcA5G3b) of C. parvum serially propagated in immunosuppressed mice has been sequenced using Illumina technology . The lack of whole genome sequence data, especially from field specimens obtained from outbreaks, has hampered our understanding of genetic determinants for host specificity, virulence, and the biological fitness of various Cryptosporidium species and C. parvum and C. hominis subtypes.
In this study, we sequenced the genomes of two dominant outbreak subtypes (IbA10G2 and IaA28R4) of C. hominis by using 454 and Illumina technologies. Prior to sequencing, oocysts were isolated directly from field specimens without propagation in laboratory animals, and extracted DNA was amplified to generate enough material for sequencing. Results of this study have (1) filled some gaps in our understanding of Cryptosporidium genomics, (2) identified some major deletions and one large insertion in the C. hominis genome, and (3) showed the high genetic similarity of the two outbreak subtypes. We have also demonstrated the occurrence of genetic recombination in chromosome 6.
Cryptosporidium hominis sequence data and de novo assemblies
Summary of sequence data from whole genome sequencing of four Cryptosporidium hominis specimens in comparison with data from the published C. hominis (TU502) and C. parvum (IOWA) genomes
Specimen (gp60 subtype)
Total sequence reads
Average coverage (fold)
# of Contigs
Illumina Genome Analyzer IIx 100 bp paired end
Illumina Genome Analyzer IIx 100 bp paired end
454 GS-FLX Titanium
454 GS-FLX Titanium
C. hominis TU502 (IaA25R3)
C. parvum IOWA (IIaA15G2R1)
Genome coverage and bacterial contamination
Coverage of four Cryptosporidium hominis genomes sequenced in this study and sequence similarities to published C. parvum (IOWA) and C. hominis (TU502) genomes
C. parvum length (bp)
Similarity to IOWA (%)
Similarity to TU502 (%)
Similarity to IOWA (%)
Similarity to TU502 (%)
Similarity to IOWA (%)
Similarity to TU502 (%)
Similarity to IOWA (%)
Similarity to TU502 (%)
Similarity to IOWA (%)
Most of the 14 unmapped contigs from specimen 37999 were small (≤1,824 bp) and were sequences of multicopy genes (ex. rRNA units) and genes with paralogs in the genome or large repetitive sequences (ex. fatty acid synthase and cgd5_1210 and cgd5_1220). However, sequences of four contigs (45, 66, 74, and 77) had no similarity to any published sequences, and one contig (#76) had 95% sequence similarity to a 500-bp region of the genome of Strentrophomonas maltophilia (CP002986). Similarly, most of the unmapped contigs from specimen 30974 were small (≤3,170 bp) and were sequences of multicopy genes (ex. rRNA units), genes with paralogs in the genome and large repetitive sequences (ex. fatty acid synthase and cgd5_2180), and telomeric sequences of Cryptosporidium. Sequences of 18 contigs (#303, 357, 392, 415, 416, 436, 492, 503, 521, 524, 529, 537, 542, 543, 551, 562, 563, and 564) had no similarity to any published Cryptosporidium sequences, and one contig (#392) had 98-100% sequence similarity to Bacteroides fragillis plasmids from humans (AB646744 and U25716). Similar observations were made for TU502. In addition, the 547 bp at the 5′ end of contig AAEL01000108 (19,113 bp) had 98% sequence similarity to cgd3_530 on chromosome 3, while the remaining part of the sequence mapped to chromosome 8. Similarly, the 5′ (15,709-bp) region of contig AAEL01000024 (36,266 bp in length) mapped to chromosome 7, the 3′ region (nucleotides 25,790-36,266) mapped to chromosome 2, while the middle region containing the rRNA unit mapped to chromosomes 1, 2, 7, and 8.
In contrast, most of the unmapped contigs from IaA28R4 specimens 30976 and 33537 had non-Cryptosporidium sequences. For example, the largest 100 unmapped contigs (16,411-138,945 bp) from specimen 33537 were 99-100% similar to the genome (CP006252) of the enterobacteria Serratia liquefaciens, with the exception of contig 0018 (94,132 bp), which was from its plasmid. As the genome of S. liquefaciens is about 5.2 Mb, the 1,464 contigs of 14,065,231 bp from specimen 33537 were from the combined C. hominis and S. liquefaciens genomes, with all S. liquefaciens contigs positioned behind the mapped Cryptosporidium sequences (Additional file 1: Figure S1). Evidence of contamination from several bacterial species was present in data from specimen 30976, as the 6,140 contigs totaled 22.13 Mb, which is larger than the combined genomes of C. hominis and one bacterial species. BLAST analysis of contigs indicated that ~28% of the total nucleotides were from members of Enterobacteriaceae and 8% from Bacteroidaceae. The 20 largest unmapped contigs (88,676-515,888 bp) had 75-85% sequence similarities to genomes of members (Serratia, Yersinia, Klebsiella, E. coli, Salmonella, etc.) of Enterobacteriaceae, except for one (contig #51), which had a 98% sequence similarity to a 21,307 bp region of an uncultured organism from the human gut (GQ873945).
Sequence similarity to published C. parvum genome and physical characteristic of C. hominis genomes
Coverage of two Illumina-sequenced Cryptosporidium hominis genomes in sequence gaps of the published C. parvum IOWA genome
Gap in C. parvum IOWA (bp)
Sequence length in C. hominis specimen (bp)
>538 (ending with telomeric repeats)
>857 (ending with telomeric repeats)
19,048 bp deletion spanning entire gap
19,048 bp deletion spanning entire gap
C. parvum- and C. hominis- specific sequences
Species-specific genes in genomes of Cryptosporidium parvum and C. hominis
cgd8_680, cgd8_690 and other potential genes in 10,000 bp sequence gap
cgd6_5480, cgd6_5490, cgd6_5510, cgd6_5520
cgd5_4580, cgd5_4590, cgd5_4610
In contrast, contig AAEL01000728 is 2,277 bp in length (23% GC) and mapped to contig 257 of specimen 30974 and contigs 1238, 1367, and 1487 of specimen 33537. It is located in chromosome 5 of specimens 30976 (contig_6) and 37999 (contig_1), and within a sequence gap area in the C. parvum IOWA genome. PCR analysis using primers based on the AAEL01000728 sequence amplified DNA of C. hominis, C. parvum, and C. andersoni, with the sequences from C. parvum and C. hominis differing from each other by two nucleotides in the 413-bp region, and from C. andersoni having 97% sequence similarity to nucleotide 19,798-20,202 of XM_002142452 (coding for a large hypothetical protein CMU_010870) from C. muris (data not shown). Contig AAEL01000717, which contains the sensor histidine kinase gene (Chro.00003, nucleotide 673–2,319), was probably not of Cryptosporidium origin. It is 2,333 bp in length had a 66% GC content. It has no equivalents in the published C. parvum and C. muris genomes and C. hominis genomes sequenced in the present study, but has a 77% sequence similarity to the sensor histidine kinase gene of Rhizobium etli (nucleotides 1,334,700-1,334,429 of CP001074). PCR primers based on this sequence did not amplify DNA of C. parvum or C. hominis (data not shown).
Sequence similarity to published C. hominis genomic data
Highly polymorphic loci in Cryptosporidium hominis genomes
Contig in 30976
SNP/kb (30976 vs Tu502)
Gene in C. hominis*
Ortholog in C. parvum*
Hypothetical protein with a signal peptide
Conserved hypothetical protein with a signal peptide
Mucin glycoprotein with a signal peptide
Intergenic downstream of Chro.20394
Intergenic downstream of cgd2_3690
WD repeat protein (cgd2_3690)
Intergenic downstream of Chro.30096
Very large mucin with a signal peptide
Hypothetical conserved protein
contig_293 + contig_255
Hypothetical protein with a signal peptide
Large uncharacterized protein
Sacsin-like HSP90 chaperone domain
Sequence similarity among sequenced C. hominis genomes and occurrence of genetic recombination
Genetic recombination in chromosome 6 of two virulent Cryptosporidium hominis subtypes
5′ end (cgd6_60)
gp60 area (cgd6_1000-cgd6_1100)
3′ end (cgd6_5240-cgd6_5320)
Intra-specimen sequence diversity at the trinucleotide repeat region of gp60
Because of a recent report on intra-specimen genetic heterogeneity seen in Illumina sequencing of a PCR-WGA product from a C. parvum specimen , we examined intra-specimen sequence diversity at the trinucleotide repeat region of the gp60 gene in the four C. hominis specimens sequenced in this study. In the specimens sequenced by using 454 technology, 205 and 310 sequence reads mapped to the gp60 gene for specimens 30974 and 33537, respectively. Among them, 78 and 59 reads had sequences fully covering the entire trinucleotide repeats for the IbA10G2 and IaA28R4 subtypes, respectively. No intra-specimen sequence diversity was seen (Additional file 4: Figure S4, Additional file 5: Figure S5). Similarly, 2,781 and 5,576 sequence reads mapped to the gp60 gene in specimens 37999 and 30976 sequenced by using Illumina, respectively. Among them, 73 and 30 reads had sequences fully covering the entire trinucleotide repeats for the IbA10G2 and IaA28R4 subtypes, respectively. No intra-specimen diversity was seen in specimen 37999, whereas in 30976 (of the IaA28R4 subtype), 28 reads had 28 copies of the TCA repeat, one had 27 copies of the TCA repeat, and one had 29 copies of the TCA repeat (data not shown).
Genome similarity between C. hominis and C. parvum, gene deletions, and species-specific genes
Results of this study have confirmed the genetic similarity between the almost fully sequenced C. parvum and C. hominis genomes. The genomes of the two species are nearly 97% similar in nucleotide sequences, with complete synteny in gene organization. This is similar to the previous conclusion based on comparison of the fully assembled genome of the C. parvum IOWA isolate and the more fragmented genome from the C. hominis TU502 isolate [24,25]. Some potential genetic rearrangements in several chromosomes were observed in the current study, but they all occurred in the ten sequence gaps and several sequence ambiguity areas in the reference C. parvum genome. As there are no HAPPY maps and genomic libraries with large inserts for C. hominis, the observations on genome organization of C. hominis need to be supported by PacBio sequencing. Nevertheless, comparative genomic analysis in this study has identified several major deletions and one insertion in C. hominis, which were overlooked in previous studies probably because of the fragmented nature of the published C. hominis genome. The significance of these gene insertions and deletions (indels) is not clear. Because of the high sequence similarity in most genes between C. parvum and C. hominis, these major indels could potentially be responsible for some biological differences between C. parvum and C. hominis.
Gene duplication and interallelic recombination could contribute to the gene expansion and losses seen between C. parvum and C. hominis genomes. Most of the genes deleted in the C. hominis genome are members of multigene families and have paralogs nearby. Thus, of the six MEDLE family of secreted protein genes possibly present in tandem in C. parvum (cgd5_4580, cgd5_4590, cgd5_4600, cgd5_4610, cgd6_5480, and cgd6_ 5490), only one, the ortholog of cgd5_4600, is present in C. hominis. Similarly, two genes (cgd6_5510 and cgd6_5520) that code for insulinase-like are absent in C. hominis. The subtelomeric locations of these genes facilitate the expansion and deletions of multicopy genes by interallelic recombination. Sequence homology is probably also involved in the loss of cgd8_680 and cgd8_690 orthologs in chromosome 8 of C. hominis, as the ~100 bp region upstream of the fragment containing the two genes and the ~100 bp region downstream of the fragment have almost identical sequences. The sequence homology in two nearby regions could have resulted in the deletion of the two genes in C. hominis during species evolution. As cgd8_690 is a paralog of cgd8_660 and has some sequence similarity to the 5′ end of cgd8_670, this gene loss in chromosome 8 of C. hominis also involves a multigene family. In compact apicomplexan genomes with mostly single copy genes, members of multigene families usually play very important biological functions [31,32]. The function of the MEDLE family of secreted proteins in apicomplexan parasites has not been examined. However, insulinase-like proteases have been shown recently to be rhoptry or microneme-associated in Toxoplasma gondii and are probably involved in cell invasion [33,34]. Indeed, both cgd6_5510 and cgd6_5520 have peak expression during the invasion process. The expression of cgd6_5480 and cgd6_ 5490 in C. parvum may also be developmentally regulated, as they showed identical expression patterns in in vitro culture . As sequence differences in non-coding regulatory elements can also affect the timing or expression levels of invasion-associated proteins, more studies are needed to determine whether the duplications of MEDLE and insulinase genes are indeed the cause of the host expansion of in C. parvum.
Compared to the deletion of at least nine genes, C. hominis appears to have only one unique gene that is absent in C. parvum. This gene, Chro.50011, is located at the 3′ end of chromosome 3 instead of the original annotation at the 5′ end of chromosome 5 (Figure 2A). It codes for a 489 aa hypothetical protein that contains RS and HS repeats at the carboxyl end, and has recently been identified as a C. hominis–specific gene, Chos-1, by Bouzid and colleagues . Although the function of the protein is not clear, it has been suggested that this protein is a member of a new Cryptosporidium-specific protein family that are candidate mediators of host specificity and virulence . It remains to be determined whether the C. hominis genome codes for additional species-specific genes in areas of the ten sequence gaps in the C. parvum IOWA genome.
Sequence similarity among C. hominis genomes and genetic recombination in virulent C. hominis subtypes
As expected, much higher genetic similarity is present among C. hominis genomes. The four C. hominis specimens sequenced in this study had whole genome sequences that are 99.72-99.83% similar to the published C. hominis genome from TU502. Genes coding for some secreted proteins (especially mucins) and proteases contribute more to the sequence differences than others, suggesting they are under selection and therefore may serve as good targets for the development of diagnostic tools and intervention measures. For example, some of the polymorphic mucin genes such as cgd2_430 (Mucin5) and cgd6_1080 (gp60) are well known targets of host immune responses [36,37] and have been used widely in subtyping C. parvum and C. hominis . Proteases (especially cysteine proteases) and protein kinases have been recently shown to play important roles in cell invasion of Cryptosporidium and thus have been used as common targets in the development of therapeutic treatments [38-41].
In contrast to the relatively high nucleotide sequence differences between the genomes sequenced in this study and the published C. hominis TU502 genome, the genomes of four specimens from two virulent C. hominis subtypes (IbA10G2 and IaA28R4) in the United States are very similar to each other except for the 3′ end of chromosome 1 and three areas in chromosome 6. In particular, chimeric sequences were seen in chromosome 6 (Table 6), indicating the occurrence of genetic recombination in the two subtypes. One of the three areas with genetic recombination is where gp60 (cgd6_1080) is located, a locus widely known for its extremely high sequence diversity and occurrence of genetic recombination . Recently, population genetic analyses of chromosome 6 sequences have shown the exclusive occurrence of genetic recombination in the virulent C. hominis subtypes IbA10G2 and IaA28R4, especially around gp60 [43,44]. It was postulated that the fitness of the two subtypes as a result of genetic recombination was likely responsible for the wide dissemination of IbA10G2 around the world and the emergence of IaA28R4 in the United States. The two IbA10G2 specimens sequenced in this study also differ from each other at the 5′ end of chromosome 6, especially in the ortholog of cgd6_60 (coding for a protease) as a result of genetic recombination. It was previously shown by MLST analysis of chromosome 6 that IbA10G2 specimens from different areas are genetically different . Although the two IaA28R4 specimens sequenced in this study are mostly identical, data from a recent population genetic study of IaA28R4 specimend in the United States suggest that there are at least two origins of the subtype . Therefore, multiple genetic recombination events are probably involved in the evolution of both IbA10G2 and IaA28R4 and are likely responsible for the observed emergence of the same virulent gp60 subtypes in different geographical locations in response to selection pressure . The occurrence of genetic recombination in virulent C. hominis subtypes also suggests that the widely used gp60-based typing alone is insufficient in molecular epidemiologic characterizations of field specimens, as pointed out previously . Therefore, the use of MLST and other multilocus subtyping tools can provide new insights into the transmission of Cryptosporidium spp. [44,47-49]. As expected, the three loci in chromosome 6 where genetic recombination occurs, cgd6_60, cgd6_1080, and cgd6_5270 (coding for a hypothetical protein with a signal peptide and paralogs) are all highly polymorphic in the present study. The biological functions of proteins coded by cgd6_60 and cgd6_5270 thus should be studied.
In conclusion, this comparative genomic analysis has revealed some unique genetic differences between C. parvum and C. hominis and identified some multigene families that can potentially contribute to differences in host specificity of the two closely related species. It has further supported the potential role of genetic recombination in the emergence and evolution of virulent C. hominis subtypes. Improvements in knowledge in these two areas are still hampered by the lack of genomic studies of other Cryptosporidium species of significant public health and economic importance, the incompleteness of the reference C. parvum and C. hominis genomes, and poor understanding of the functions of the thousands of hypothetical proteins in Cryptosporidium genomes and regulatory elements in non-coding areas. With the increased recognition of the importance of cryptosporidiosis in pediatric health in developing countries , common occurrence of large waterborne outbreaks in industrialized nations [15,16,50], and a major increase in cryptosporidiosis incidence in the United States in recent years [6,8,9], more effort should be directed toward studies on functional genomics and the basic biology of Cryptosporidium spp. .
Four C. hominis specimens were used in whole genome sequencing in the study: specimens 30974 and 37999 of the IbA10G2 subtype and 30976 and 33537 of the IaA28R4 subtype. Specimen 30974 was collected from a patient from a cryptosporidiosis outbreak in July 2010 in Columbia, South Carolina associated with a splash pad that had problems with filtration and chlorination. Testing of filter backflush and stools from six patients all identified the presence of the C. hominis IbA10G2 subtype. Specimen 30976 was collected from a patient in a cryptosporidiosis outbreak in July 2010 in the St. Louis area in Illinois and Missouri associated with swimming pools and a water park. Testing of nine patient specimens identified the occurrence of C. hominis IaA28R4 in seven patients, IaA24R4 in one patient, and IdA15G1 in another patient. Specimen 33537 was collected from a patient from a cryptosporidiosis outbreak in July 2011 in Walsenburg, Colorado associated with a waterpark that had problems with the chlorinator. Testing of filter backflush and stools from five patients identified IaA28R4 in all. Specimen 37999 was collected from a sporadic cryptosporidiosis patient in Twin Falls, Idaho in September 2012. All stool specimens were collected fresh from symptomatic patients and stored in 2.5% potassium dichromate at 4°C prior to being used in Cryptosporidium oocyst isolation for whole genome sequencing within 6 months. Cryptosporidium species and subtypes were determined by PCR-RFLP analysis of the small subunit rRNA and sequence analysis of the 60 kDa glycoprotein (gp60) genes, respectively .
Oocyst isolation and whole genome amplification
Cryptosporidium oocysts were isolated from stool specimens by discontinuous sucrose and cesium chloride gradients as previously described . They were further purified by immunomagnetic separation using the Dynabeads Anti-Cryptosporidium kit (Invitrogen, Carlsbad, CA). After treating the purified oocysts with 10% commercial bleach on ice for 10 min and five cycles of freezing and thawing, DNA was extracted from them by using the Qiagen DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA). Whole genome amplification (WGA) of the 25–100 ng of extracted DNA was conducted by using the REPLI-g Midi Kit (Qiagen). The quality of the WGA products was verified by sequencing BamHI-digested WGA products cloned into a pUC19 vector (Fermantas, Pittsburgh, PA). The sequencing was done by using the ABI BigDye Terminator v3.1 Cycle Sequencing Kit on an ABI3130 Genetic Analyzer (Applied Biosystems, Foster City, CA).
454 and Illumina sequencing and de novo contig assembly
The WGA products from specimens 30974 and 33537 were sequenced with 454 technology on a GS-FLX Titanium System (Roche, Branford, CT) by using approximately 1 μg of DNA for library construction and following standard Roche library protocols, with an average insert size of 600 bp. One full PTP plate was used in the analysis of each specimen. The sequence reads from each run were assembled using Newbler in the GS De Novo Assembler (http://www.454.com/products/analysis-software/) with the default settings.
The WGA products from specimens 30976 and 37999 were used to generate Illumina TruSeq (v3) libraries (average insert size: 350 bp) and sequenced 100×100 bp paired-end on an Illumina Genome Analyzer IIx (Illumina, San Diego, CA). The sequence reads with a minimum quality of 20 were trimmed by using CLC Assembly Cell 4.1.0 (http://www.clcbio.com/products/clc-assembly-cell/). The data were then assembled with default parameters and a minimum contig length of 500 bp, with scaffolding using paired-end data.
Comparative genomic analyses
For comparisons of sequences at the genome level, contigs of each specimen were aligned with reference sequences of the near complete genome of the C. parvum IOWA isolate (version AAEE00000000.1) and the 1,422 contigs of the C. hominis TU5205 isolate (version NZ_AAEL00000000.1) using Nucmer, a tool in MUMmer 3.23 (http://mummer.sourceforge.net/) . Multiple genome alignments were also constructed by using the progressive alginment algorithm of the Mauve 2.3.1 (http://asap.genetics.wisc.edu/software/mauve/) with default options . In-house perl scripts were developed to calculate the average nucleotide identities. For the detection of SNPs, Fastqc 0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used for the QC analysis of Illumina sequence reads, and PRINSEQ 0.20.3 (http://prinseq.sourceforge.net/)  was used to remove low quality reads, with a min_qual_mean setting of 20 and min_len of 65. Reads were then aligned to reference sequences by using Bowtie 0.12.7 (http://bowtie-bio.sourceforge.net/index.shtml) . The resulting SAM files were processed, sorted and duplicates were removed by using Picard 1.126 (http://broadinstitute.github.io/picard/). The mpileup in SAMtools (http://samtools.sourceforge.net/) was finally used to create the pileup file for SNP variant calls using the mpileup2snp in VarScan 2.3.7 (http://varscan.sourceforge.net/) . Default parameters for VarScan were used except that min-avg-qual was set to 30.
As the comparative genomic analysis had identified some nucleotide sequences (AAEL01000413, AAEL01000728, and AAEL01000717) in the published C. hominis that had not been seen in the published C. parvum genome, primers were designed based on these sequences to verify the source of these sequences by PCR (Additional file 6: Table S1). Five specimens each of C. parvum and C. hominis were used in PCR analysis of each target. In addition, two C. andersoni specimens were used in confirmation of Cryptosporidium-origin of contig AAEL01000728. Each specimen was analyzed in duplicate nested PCR using 50 μl PCR mixture consisting of 1 μl (~100 ng) of extracted DNA or 2 μL of primary PCR products (in secondary PCR), 200 μM deoxynucleoside triphosphate, 1× PCR buffer (Applied Biosystems), 3.0 mM MgCl2, 5.0 U of Taq polymerase (Promega, Madison, WI), 100 nM primers, and 400 ng/μl of non-acetylated bovine serum albumin (Sigma-Adrich, St. Louis, MO). The primary and secondary PCR reactions were performed in a GeneAmp PCR 9700 thermocycler (Applied Biosystems) for 35 cycles of 94°C for 45 s, 55°C for 45 s, and 72°C for 60 s, with an initial denaturation (94°C for 5 min) and a final extension (72°C for 7 min). The secondary PCR products were sequenced in both directions using Sanger technology described above. Nucleotide sequences obtained were aligned with reference sequences downloaded from GenBank by using ClustalX (http://www.clustal.org/).
NCBI BioProject No.
Nucleotide sequences generated from the project, including all SRA data and assembled contigs, were submitted to the NCBI BioProject under the accession number PRJNA252787.
The study was done on delinked residual diagnostic specimens. It was covered by Human Subjects Protocol No. 990115 “Use of residual human specimens for the determination of frequency of genotypes or sub-types of pathogenic parasites”, which was reviewed and approved by the Institutional Review Board of the Centers for Disease Control and Prevention (CDC). No personal identifiers were associated with the specimens at the time of submission for diagnostic service at CDC.
- Chalmers RM, Davies AP. Minireview: clinical cryptosporidiosis. Exp Parasitol. 2010;124:138–46.View ArticlePubMedGoogle Scholar
- Mor SM, Tzipori S. Cryptosporidiosis in children in sub-saharan Africa: a lingering challenge. Clin Infect Dis. 2008;47(7):915–21.View ArticlePubMed CentralPubMedGoogle Scholar
- Kotloff KL, Nataro JP, Blackwelder WC, Nasrin D, Farag TH, Panchalingam S, et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case–control study. Lancet. 2013;382(9888):209–22.View ArticlePubMedGoogle Scholar
- Baldursson S, Karanis P. Waterborne transmission of protozoan parasites: review of worldwide outbreaks - an update 2004–2010. Water Res. 2011;45(20):6603–14.View ArticlePubMedGoogle Scholar
- Scallan E, Hoekstra RM, Angulo FJ, Tauxe RV, Widdowson MA, Roy SL, et al. Foodborne illness acquired in the United States–major pathogens. Emerg Infect Dis. 2011;17(1):7–15.View ArticlePubMed CentralPubMedGoogle Scholar
- Yoder JS, Beach MJ. Cryptosporidiosis surveillance–United States, 2003–2005. MMWR Surveill Summ. 2007;56(7):1–10.PubMedGoogle Scholar
- Hlavsa MC, Watson JC, Beach MJ. Cryptosporidiosis surveillance–United States 1999–2002. MMWR Surveill Summ. 2005;54(1):1–8.PubMedGoogle Scholar
- Yoder JS, Harral C, Beach MJ. Cryptosporidiosis surveillance - United States, 2006–2008. MMWR Surveill Summ. 2010;59(6):1–14.PubMedGoogle Scholar
- Yoder JS, Wallace RM, Collier SA, Beach MJ, Hlavsa MC. Cryptosporidiosis surveillance - United States, 2009–2010. MMWR Surveill Summ. 2012;61(5):1–12.PubMedGoogle Scholar
- Xiao L. Molecular epidemiology of cryptosporidiosis: an update. Exp Parasitol. 2010;124:80–9.View ArticlePubMedGoogle Scholar
- Chalmers RM, Robinson G, Elwin K, Hadfield SJ, Thomas E, Watkins J, et al. Detection of Cryptosporidium species and sources of contamination with Cryptosporidium hominis during a waterborne outbreak in north west Wales. J Water Health. 2010;8(2):311–25.View ArticlePubMedGoogle Scholar
- Mayne DJ, Ressler KA, Smith D, Hockey G, Botham SJ, Ferson MJ. A community outbreak of cryptosporidiosis in sydney associated with a public swimming facility: a case–control study. Interdis Perspect Infect Dis. 2011;2011:341065.Google Scholar
- Ng JS, Pingault N, Gibbs R, Koehler A, Ryan U. Molecular characterisation of Cryptosporidium outbreaks in Western and South Australia. Exp Parasitol. 2010;125(4):325–8.View ArticlePubMedGoogle Scholar
- Waldron LS, Ferrari BC, Cheung-Kwok-Sang C, Beggs PJ, Stephens N, Power ML. Molecular epidemiology and spatial distribution of a waterborne cryptosporidiosis outbreak in Australia. Appl Environ Microbiol. 2011;77(21):7766–71.View ArticlePubMed CentralPubMedGoogle Scholar
- Fournet N, Deege MP, Urbanus AT, Nichols G, Rosner BM, Chalmers RM, et al. Simultaneous increase of Cryptosporidium infections in the Netherlands, the United Kingdom and Germany in late summer season, 2012. Euro Surveill. 2013;18(2):20348.PubMedGoogle Scholar
- Widerstrom M, Schonning C, Lilja M, Lebbad M, Ljung T, Allestam G, et al. Large outbreak of Cryptosporidium hominis infection transmitted through the public water supply. Sweden Emerg Infect Dis. 2014;20(4):581–9.View ArticleGoogle Scholar
- Xiao L, Hlavsa MC, Yoder J, Ewers C, Dearen T, Yang W, et al. Subtype analysis of Cryptosporidium specimens from sporadic cases in Colorado, Idaho, New Mexico, and Iowa in 2007: widespread occurrence of one Cryptosporidium hominis subtype and case history of an infection with the Cryptosporidium horse genotype. J Clin Microbiol. 2009;47(9):3017–20.View ArticlePubMed CentralPubMedGoogle Scholar
- Cantey PT, Kurian AK, Jefferson D, Moerbe MM, Marshall K, Blankenship WR, et al. Outbreak of cryptosporidiosis associated with a man-made chlorinated lake–Tarrant County, Texas, 2008. J Environ Health. 2012;75(4):14–9.PubMedGoogle Scholar
- Valderrama AL, Hlavsa MC, Cronquist A, Cosgrove S, Johnston SP, Roberts JM, et al. Multiple risk factors associated with a large statewide increase in cryptosporidiosis. Epidemiol Infect. 2009;137:1781–8.View ArticlePubMedGoogle Scholar
- Centers for Disease CaP. Outbreak of cryptosporidiosis associated with a splash park - Idaho, 2007. MMWR Morb Mortal Wkly Rep. 2009;58(22):615–8.Google Scholar
- Zhu G, Xiao L. Cryptosporidium species. In: Fratamico P, Liu Y, Kathariou S, editors. Genomes of Foodborne and Waterborne Pathogens. Washington, DC: American Society for Microbiology; 2011. p. 271–86.View ArticleGoogle Scholar
- Widmer G, Sullivan S. Genomics and population biology of Cryptosporidium species. Parasite Immunol. 2012;34(2–3):61–71.View ArticlePubMed CentralPubMedGoogle Scholar
- Chalmers RM, Katzer F. Looking for Cryptosporidium: the application of advances in detection and diagnosis. Trends Parasitol. 2013;29(5):237–51.View ArticlePubMedGoogle Scholar
- Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, et al. Complete genome sequence of the apicomplexan. Cryptosporidium parvum Sci. 2004;304(5669):441–5.Google Scholar
- Xu P, Widmer G, Wang Y, Ozaki LS, Alves JM, Serrano MG, et al. The genome of Cryptosporidium hominis. Nature. 2004;431(7012):1107–12.View ArticlePubMedGoogle Scholar
- Widmer G, Lee Y, Hunt P, Martinelli A, Tolkoff M, Bodi K. Comparative genome analysis of two Cryptosporidium parvum isolates with different host range. Infect Genet Evol. 2012;12(6):1213–21.View ArticlePubMed CentralPubMedGoogle Scholar
- Piper MB, Bankier AT, Dear PH. A HAPPY map of Cryptosporidium parvum. Genome Res. 1998;8(12):1299–307.PubMed CentralPubMedGoogle Scholar
- Xiao L, Ryan UM. Molecular epidemiology. In: Fayer R, Xiao L, editors. Cryptosporidium and Cryptosporidiosis. 2nd ed. Boca Raton, FL: CRC Press and IWA Publishing; 2008. p. 119–71.Google Scholar
- Mauzy MJ, Enomoto S, Lancto CA, Abrahamsen MS, Rutherford MS. The Cryptosporidium parvum transcriptome during in vitro development. PLoS One. 2012;7(3):e31715.View ArticlePubMed CentralPubMedGoogle Scholar
- Grinberg A, Biggs PJ, Dukkipati VS, George TT. Extensive intra-host genetic diversity uncovered in Cryptosporidium parvum using Next Generation Sequencing. Infect Genet Evol. 2013;15:18–24.View ArticlePubMedGoogle Scholar
- DeBarry JD, Kissinger JC. A Survey of innovation through duplication in the reduced genomes of twelve parasites. PLoS One. 2014;9(6), e99213.View ArticlePubMed CentralPubMedGoogle Scholar
- Anantharaman V, Iyer LM, Balaji S, Aravind L. Adhesion molecules and other secreted host-interaction determinants in Apicomplexa: insights from comparative genomics. Int Rev Cytol. 2007;262:1–74.View ArticlePubMedGoogle Scholar
- Hajagos BE, Turetzky JM, Peng ED, Cheng SJ, Ryan CM, Souda P, et al. Molecular dissection of novel trafficking and processing of the Toxoplasma gondii rhoptry metalloprotease toxolysin-1. Traffic. 2012;13(2):292–304.View ArticlePubMed CentralPubMedGoogle Scholar
- Laliberte J, Carruthers VB. Toxoplasma gondii toxolysin 4 is an extensively processed putative metalloproteinase secreted from micronemes. Mol Biochem Parasitol. 2011;177(1):49–56.View ArticlePubMed CentralPubMedGoogle Scholar
- Bouzid M, Hunter PR, McDonald V, Elwin K, Chalmers RM, Tyler KM. A new heterogeneous family of telomerically encoded Cryptosporidium proteins. Evol Appl. 2013;6(2):207–17.View ArticlePubMed CentralPubMedGoogle Scholar
- O'Connor RM, Burns PB, Ha-Ngoc T, Scarpato K, Khan W, Kang G, et al. The polymorphic mucin antigens CpMuc4 and CpMuc5 are integral to Cryptosporidium parvum infection in vitro. Eukaryot Cell. 2009;8:461–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Ajjampur SS, Sarkar R, Allison G, Banda K, Kane A, Muliyil J, et al. Serum IgG response to Cryptosporidium immunodominant antigen gp15 and polymorphic antigen gp40 in children with cryptosporidiosis in South India. Clin Vaccine Immunol. 2011;18(4):633–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Ndao M, Nath-Chowdhury M, Sajid M, Marcus V, Mashiyama ST, Sakanari J, et al. A cysteine protease inhibitor rescues mice from a lethal Cryptosporidium parvum infection. Antimicrob Agents Chemother. 2013;57(12):6063–73.View ArticlePubMed CentralPubMedGoogle Scholar
- Kang JM, Ju HL, Yu JR, Sohn WM, Na BK. Cryptostatin, a chagasin-family cysteine protease inhibitor of Cryptosporidium parvum. Parasitology. 2012;139(8):1029–37.View ArticlePubMedGoogle Scholar
- Keyloun KR, Reid MC, Choi R, Song Y, Fox AM, Hillesland HK, et al. The gatekeeper residue and beyond: homologous calcium-dependent protein kinases as drug development targets for veterinarian Apicomplexa parasites. Parasitology. 2014;141(11):1499–509.View ArticlePubMed CentralPubMedGoogle Scholar
- Castellanos-Gonzalez A, White Jr AC, Ojo KK, Vidadala RS, Zhang Z, Reid MC, et al. A novel calcium-dependent protein kinase inhibitor as a lead compound for treating cryptosporidiosis. J Infect Dis. 2013;208(8):1342–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Leav BA, Mackay MR, Anyanwu A, RM OC, Cevallos AM, Kindra G, et al. Analysis of sequence diversity at the highly polymorphic Cpgp40/15 locus among Cryptosporidium isolates from human immunodeficiency virus-infected children in South Africa. Infect Immun. 2002;70(7):3881–90.View ArticlePubMed CentralPubMedGoogle Scholar
- Li N, Xiao L, Cama VA, Ortega Y, Gilman RH, Guo M, et al. Genetic recombination and Cryptosporidium hominis virulent subtype IbA10G2. Emerg Infect Dis. 2013;19(10):1573–82.View ArticlePubMed CentralPubMedGoogle Scholar
- Feng Y, Tiao N, Li N, Hlavsa M, Xiao L. Multilocus sequence typing of an emerging Cryptosporidium hominis subtype in the United States. J Clin Microbiol. 2014;52(2):524–30.View ArticlePubMed CentralPubMedGoogle Scholar
- Gatei W, Barrett D, Lindo JF, Eldemire-Shearer D, Cama V, Xiao L. Unique Cryptosporidium population in HIV-Infected persons. Jamaica Emerg Infect Dis. 2008;14(5):841–3.View ArticleGoogle Scholar
- Widmer G. Meta-analysis of a polymorphic surface glycoprotein of the parasitic protozoa Cryptosporidium parvum and Cryptosporidium hominis. Epidemiol Infect. 2009;137:1800–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Feng Y, Torres E, Li N, Wang L, Bowman D, Xiao L. Population genetic characterisation of dominant Cryptosporidium parvum subtype IIaA15G2R1. Int J Parasitol. 2013;43(14):1141–7.View ArticlePubMedGoogle Scholar
- Drumo R, Widmer G, Morrison LJ, Tait A, Grelloni V, D'Avino N, et al. Evidence of host-associated populations of Cryptosporidium parvum in Italy. Appl Environ Microbiol. 2012;78(10):3523–9.View ArticlePubMed CentralPubMedGoogle Scholar
- Herges GR, Widmer G, Clark ME, Khan E, Giddings CW, Brewer M, et al. Evidence that Cryptosporidium parvum populations are panmictic and unstructured in the Upper Midwest of the United States. Appl Environ Microbiol. 2012;78(22):8096–101.View ArticlePubMed CentralPubMedGoogle Scholar
- Hlavsa MC, Roberts VA, Anderson AR, Hill VR, Kahler AM, Orr M, et al. Surveillance for waterborne disease outbreaks and other health events associated with recreational water –- United States, 2007–2008. MMWR Surveill Summ. 2011;60(12):1–32.PubMedGoogle Scholar
- Striepen B. Parasitic infections: time to tackle cryptosporidiosis. Nature. 2013;503(7475):189–91.View ArticlePubMedGoogle Scholar
- Arrowood MJ, Donaldson K. Improved purification methods for calf-derived Cryptosporidium parvum oocysts using discontinuous sucrose and cesium chloride gradients. J Eukaryot Microbiol. 1996;43(5):89S.View ArticlePubMedGoogle Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12.View ArticlePubMed CentralPubMedGoogle Scholar
- Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403.View ArticlePubMed CentralPubMedGoogle Scholar
- Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics (Oxford, England). 2011;27(6):863–4.View ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.View ArticlePubMed CentralPubMedGoogle Scholar
- Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568–76.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.