Comparative analysis reveals conservation in genome organization among intestinal Cryptosporidium species and sequence divergence in potential secreted pathogenesis determinants among major human-infecting species

Background Cryptosporidiosis is a major cause of gastrointestinal diseases in humans and other vertebrates. Previous analyses of invasion-related proteins revealed that Cryptosporidium parvum, Cryptosporidium hominis, and Cryptosporidium ubiquitum mainly differed in copy numbers of secreted MEDLE proteins and insulinase-like proteases and sequences of mucin-type glycoproteins. Recently, Cryptosporidium chipmunk genotype I was identified as a novel zoonotic pathogen in humans. In this study, we sequenced its genome and conducted a comparative genomic analysis. Results The genome of Cryptosporidium chipmunk genotype I has gene content and organization similar to C. parvum and other intestinal Cryptosporidium species sequenced to date. A total of 3783 putative protein-encoding genes were identified in the genome, 3525 of which are shared by Cryptosporidium chipmunk genotype I and three major human-pathogenic Cryptosporidium species, C. parvum, C. hominis, and Cryptosporidium meleagridis. The metabolic pathways are almost identical among these four Cryptosporidium species. Compared with C. parvum, a major reduction in gene content in Cryptosporidium chipmunk genotype I is in the number of telomeric genes encoding MEDLE proteins (two instead of six) and insulinase-like proteases (one instead of two). Highly polymorphic genes between the two species are mostly subtelomeric ones encoding secretory proteins, most of which have higher dN/dS ratios and half are members of multiple gene families. In particular, two subtelomeric ABC transporters are under strong positive selection. Conclusions Cryptosporidium chipmunk genotype I possesses genome organization, gene content, metabolic pathways and invasion-related proteins similar to the common human-pathogenic Cryptosporidium species, reaffirming its human-pathogenic nature. The loss of some subtelomeric genes encoding insulinase-like proteases and secreted MEDLE proteins and high sequence divergence in secreted pathogenesis determinants could contribute to the biological differences among human-pathogenic Cryptosporidium species. Electronic supplementary material The online version of this article (10.1186/s12864-019-5788-9) contains supplementary material, which is available to authorized users.

invasion-related proteins in apicomplexans or modifying host cell proteins [14]. Mucin-type glycoproteins are known to be involved in the attachment and invasion of Cryptosporidium spp. [15]. Compared with C. parvum, a reduction in the numbers of genes encoding the MEDLE family secreted proteins and insulinase-like proteases was seen in the 3′ subtelomeric regions of chromosomes 5 and 6 of the C. hominis genome [11]. The orthologous regions encoding subtelomeric insulinases and MEDLE proteins are entirely absent in the genomes of C. ubiquitum and gastric species Cryptosporidium andersoni [12]. In addition to the gene losses, genetically related Cryptosporidium species differ significantly in sequences of mucin-type glycoproteins [11,12]. As intestinal and gastric Cryptosporidium species differ significantly in the numbers and sequences of genes encoding mucin-type glycoproteins and insulinase-like proteases, these proteins and other secreted pathogenesis determinants (SPDs) potentially play an important role in tissue tropism also [12].
Although the genomes of several Cryptosporidium species have been sequenced recently, we still have very limited knowledge of genome evolution among Cryptosporidium spp. [16,17]. In this study, we have sequenced the genome of Cryptosporidium chipmunk genotype I and conducted a comparative genomic analysis of eight Cryptosporidium species that have been sequenced thus far [11,12,[18][19][20].

Genome features
We generated 6.8 million 250-bp paired-end reads from one Cryptosporidium chipmunk genotype I isolate 37,763 from a naturally infected person in the United States by Illumina sequencing. After filtering out contigs from contaminants among the 298 initial contigs generated using the CLC Genomics Workbench, we assembled a Cryptosporidium genome of 9.05 Mb in 50 contigs (without any scaffolding during the processing), with an estimated 188-fold coverage and an N50 of 320,570 bp. We combined gene prediction results obtained from Augustus, Geneid, and Genemark, leading to the identification of 3783 protein-encoding genes. At the genome level, Cryptosporidium chipmunk genotype I has high nucleotide and amino acid sequence identity to C. parvum (82. 25 and 83.49%, respectively), C. hominis (82. 48 and 83.99%, respectively), and C. meleagridis (81. 22 and 81.68%, respectively; Table 1). Among the eight Cryptosporidium species with whole genome sequence data, Cryptosporidium chipmunk genotype I has the highest GC content in the overall genome (32.0%) and coding regions (33.6%). The genome of Cryptosporidium chipmunk genotype I has near complete sequence synteny with that of C. parvum and C. ubiquitum (Fig. 1a), with a rearrangement of~126 kb between Cryptosporidium chipmunk genotype I and C. parvum. The 5′ subtelomeric region of chromosome 6 in Cryptosporidium chipmunk genotype I, which contains 52 genes, is translocated with the 5′ subtelomeric region of chromosome 8 containing 53 genes (cgd8_10~cgd8_530) in C. parvum. This rearrangement was observed in both assemblies produced by the CLC Genomics Workbench and the SPAdes assembler. Advanced sequencing using the PacBio technology is needed to confirm the existence of this genome rearrangement. Lower synteny was seen with genomes of C. baileyi and C. andersoni. Cryptosporidium chipmunk genotype I shares almost the same gene density and number of tRNA genes with other Cryptosporidium spp. It, however, has gene content slightly lower than C. parvum and C. hominis, but similar to C. meleagridis, C. ubiquitum, and C. baileyi (Table 1).
Orthology delineation identified only a small number of species-specific genes among eight Cryptosporidium spp. Approximately 3525 genes are shared by C. parvum, C. hominis, C. meleagridis, and Cryptosporidium chipmunk genotype I (Fig. 1b). There are only three Cryptosporidium chipmunk genotype I-specific genes. One of them was identified as an insulinase-like protease, but the functions of other two genes are unknown. Phylogenetic analysis of amino acid sequences from 100 orthologous genes supported the close relatedness of Cryptosporidium chipmunk genotype I to these human-pathogenic Cryptosporidium species (Fig. 2a).
Multiple gene families are present in Cryptosporidium chipmunk genotype I as well as other Cryptosporidium species. Protein architecture network analysis of Cryptosporidium chipmunk genotype I, C. parvum, and C. meleagridis revealed the existence of several clusters (Fig. 3a). Two of the major clusters (1 and 2) in the network consisted of protein kinases and insulinase-like peptidases of the three Cryptosporidium species. There are 75, 79, and 78 genes encoding protein kinases in Cryptosporidium chipmunk genotype I, C. parvum, and C. meleagridis, respectively. C. parvum possesses 23  ATPases associated with diverse cellular activities (AAA) and ATP-binding cassette (ABC) transporters formed Cluster 4 and 5. We found 21 genes encoding ABC transporters in all three species. Compared with C. parvum and C. meleagridis, one gene encoding AAA proteins was lost in Cryptosporidium chipmunk genotype I (24 AAA proteins). In addition, the Ras proteins, which are involved in-intracellular signaling, formed Cluster 7. Furthermore, the 12 thrombospondin-related adhesive proteins (TRAPs), which are presumably microneme proteins present in all three Cryptosporidium species under analysis [21,22], are included in Cluster 8 (Fig. 3b). Like other Cryptosporidium spp., Cryptosporidium chipmunk genotype I lacks genes encoding enzymes for de novo isoprenoid biosynthesis. Two genes encoding farnesyl diphosphate (FPP) synthase (Cch_19.1677) and polyprenyl synthase (Cch _17.1265) were detected in Cryptosporidium chipmunk genotype I. These two genes were shown transcribed in C. parvum in vitro [23], but are absent in C. ubiquitum [12].

Electron transport chain
A progressive reduction in the electron transport chain was reported in Cryptosporidium spp. [12]. Most intestinal Cryptosporidium spp. have an alternative oxidase (AOX) and a reduced conventional electron transport system, except for C. ubiquitum, which does not have them and the AOX. Unlike C. ubiquitum, Cryptosporidium chipmunk genotype I and the three major human-pathogenic species possess all enzymes and proteins involved in the ubiquinone biosynthesis (Fig. 4).
The number of mitochondrial carrier proteins in Cryptosporidium spp. is in agreement with the nature of the electron transport system. As reported previously [12], gastric Cryptosporidium spp. have more mitochondrial carrier proteins than intestinal Cryptosporidium spp. (Table 3). Among the latter, eight mitochondrial carrier proteins were detected in Cryptosporidium chipmunk genotype I and C. meleagridis, compared with nine in C. parvum and C. hominis and six in C. ubiquitum and C. baileyi, which also does not have the AOX A B Fig. 3 Protein architecture network based on sequence similarity of all proteins in proteomes of Cryptosporidium chipmunk genotype I, Cryptosporidium parvum, and Cryptosporidium meleagridis. a Proteins of Cryptosporidium chipmunk genotype I, C. parvum, and C. meleagridis, represented by the colors red, green, and blue, respectively. b Identity of major clusters in the Cryptosporidium proteome (Table 3). These data indicate that the mitosome metabolic capability in Cryptosporidium chipmunk genotype I is similar to that in the three major human-pathogenic Cryptosporidium species.

Nucleotide metabolism
All Cryptosporidium spp. cannot synthesize purine rings or pyrimidines de novo (Table 2). Instead, they must salvage these nucleotides from the host via the nucleoside transporter (Table 3). However, the enzymes involved in the inter-conversion of purines and pyrimidines are different among Cryptosporidium species. The gene encoding the guanosine monophosphate (GMP) synthase (cgd5_4520 in C. parvum) is lost in Cryptosporidium chipmunk genotype I, indicating that Cryptosporidium chipmunk genotype I cannot convert xanthosine 5′-phosphate (XMP) to GMP. Furthermore, the last gene (cgd1_3860) in chromosome 1 of C. parvum, which encodes a deoxyuridine triphosphate (dUTP) diphosphatase, has an ortholog in C. hominis (Chro.10434), but is absent in Cryptosporidium chipmunk genotype I and C. meleagridis (Additional file 1: Table S1). The ortholog of another dUTP diphosphatase gene in C. parvum (cgd7_5170), however, is present in Cryptosporidium chipmunk genotype I (Cch_42.3131).

N-glycan and GPI-anchor precursors in Cryptosporidium chipmunk genotype I
A secondary loss of Alg genes in asparagine (N)-linked glycosylation was reported in apicomplexans [24]. The biosynthesis of N-glycans is different not only among apicomplexan parasites but also within the genus Cryptosporidium. Similar to C. hominis, C. parvum, C. meleagridis, and C. ubiquitum, Cryptosporidium chipmunk genotype I possesses nine sugars in N-glycan precursors, compared to eight sugars in C. baileyi and five in C. andersoni.
In glycosylphosphatidylinositol (GPI) anchor biosynthesis, the essential phosphatidylinositol glycan (PIG)-B was detected in Cryptosporidium chipmunk genotype I but lost in C. ubiquitum. Similar to other Cryptosporidium spp., genes encoding PIG-W and glycosylphosphatidylinositol deacylase (PGAP1) involved in the acylation Conversion between UDP-Glc and UDP-Gal

Conversion between GDP-Man and GDP-Fuc
Conversion from UDP-Glc to UDP-GlcA to UDP-Xyl

Synthesis of mannitol from fructose
Fatty acid biosynthesis in cytosol (FAS I) Oxidative phosphorylation (NADH dehydrogenase) F-ATPase 2 sub 2 sub 2 sub 2 sub 2 sub 2 sub + + + Alternative oxidase (AOX) Conversion from serine to glycine and de-acylation of inositol are absent in Cryptosporidium chipmunk genotype I.

Characteristics of invasion-related proteins in
Cryptosporidium chipmunk genotype I Cryptosporidium chipmunk genotype I and other intestinal Cryptosporidium spp. possess similar numbers and components of major protein families, including some of those involved in invasion, such as protein kinases and TRAPs. Cryptosporidium species, however, differ in the number of genes encoding other invasion-related proteins, such as insulinase-like peptidases, MEDLE secretory proteins, and mucin glycoproteins. For example, gastric species C. andersoni and C. muris have fewer genes encoding insulinase-like peptidases (Fig. 5). Compared with C. parvum, two of the 23 insulinase-like protease genes and four of the six MEDLE family protein genes are lost in Cryptosporidium chipmunk genotype I, all located at the subtelomeric regions of chromosomes 5 and 6 (Additional file 1: Table S2). A new gene (Cch_105.391) of the insulinase gene family, which has significant sequence similarity to cgd3_4260, was detected at the 5′ end of chromosome 7 (con-tig_105). Furthermore, all three major human-infecting species, C. parvum, C. hominis, and C. meleagridis, possess MEDLE protein genes, but none of them were observed in C. ubiquitum, C. baileyi, C. andersoni, or C. muris (Additional file 1: Table S2).
Comparisons of mucin-type glycoproteins among eight Cryptosporidium species had shown a high divergence between human-infecting and animal-infecting species. The gp60/40/15 complex, which is a single-copy gene in Cryptosporidium chipmunk genotype I, is absent in C. andersoni and C. muris, but has 7 paralogous genes in two clusters in C. baileyi. Cryptosporidium chipmunk genotype I possesses a series of mucin-type glycoproteins, such as CP2, but many of them are absent in C. baileyi, C. andersoni, or C. muris (Additional file 1: Table S2). Phylogenetic analysis of invasion-related proteins, including mucin-type glycoproteins, insulinase-like proteases and TRAPs, confirmed the close relatedness of Cryptosporidium chipmunk genotype I to human-infecting species (Fig. 2b-d).

Highly divergent genes between Cryptosporidium chipmunk genotype I and Cryptosporidium parvum
The putative proteome of Cryptosporidium chipmunk genotype I was compared with the annotated protein-encoding genes of C. parvum and C. ubiquitum. We found 49 highly divergent genes between Cryptosporidium chipmunk genotype I and these two Cryptosporidium species with an amino acid identity below 65% (Additional file 1: Table S3). Among them, 43 (87.8%) genes encode proteins with signal peptides, 41 (84.9%) are located in the subtelomeric regions, and 25 (51.0%) possess paralogous genes. Many of the genes encode

Genes under selection pressure
The dN/dS analysis was used to identify orthologous genes under selection between Cryptosporidium chipmunk genotype I and C. parvum, two species with different host ranges. Genes encoding invasion-related proteins, secreted proteins, and surface-associated proteins, which could be involved in host immune responses, exhibited elevated dN/dS ratios. In contrast, genes encoding proteins that are involved in metabolic pathways had reduced dN/dS ratios (Fig. 6). Among all orthologous genes, there are only six genes with dN/dS ratios > 1, thus under positive selection. Two of them (C_ch_8.3686 and C_ch_8.3664) encode ABC transporters. Among the 20 orthologous genes with the highest dN/dS ratios, 9 (45%) encode proteins with signal peptides, 11 (55%) encode membrane-bound proteins, and 14 (70%) are located in the subtelomeric regions (Table 4).

Discussion
Results of comparative genomic analysis in this study suggest that the metabolic pathways in Cryptosporidium chipmunk genotype I are similar to those in major human-infecting Cryptosporidium species, including C. parvum, C. hominis, and C. meleagridis [18,19]. Unlike C. muris and C. andersoni [12], Cryptosporidium chipmunk genotype I does not use the TCA cycle or conventional oxidative phosphorylation for energy production. Like C. parvum and C. hominis, Cryptosporidium chipmunk genotype I possesses an alternative oxidative phosphorylation chain, which is lost in C. ubiquitum and C. baileyi. The similarity in metabolism between Cryptosporidium chipmunk genotype I and other human-infecting species is a reflection of their genetic relatedness. This has been confirmed by results of phylogenetic analyses of 100 conserved proteins and several families of invasion-related proteins. The genome organization of Cryptosporidium chipmunk genotype I is also similar to other intestinal Cryptosporidium species. The genome sizes of the human-pathogenic Cryptosporidium species are all near 9 Mb, which is slightly smaller than the 9.21 Mb in C. muris. As expected, Cryptosporidium chipmunk genotype I has a gene content just slightly lower than human-pathogenic Cryptosporidium species. In contrast, the genomes of seven Eimeria species in chickens vary significantly in size , with the number of predicted protein-encoding genes over a range of6 000-10,000 genes [25]. Similar differences in genome sizes and gene contents exist among Plasmodium spp. [26] or Babesia spp. [27]. Thus, compared with other apicomplexans, intestinal Cryptosporidium species have shown high genome conservation. The differences in  host range among intestinal Cryptosporidium species could be potentially caused by the minor gene gains and losses or sequence polymorphism in SPDs encoded by genes located in subtelomeric regions. Compared with C. parvum, a major reduction in gene content in Cryptosporidium chipmunk genotype I is in the number of subtelomeric genes encoding secreted MEDLE proteins and insulinase-like proteases. Cryptosporidium parvum has two subtelomeric genes for insulinase-like proteases (cgd6_5520-5510 and a paralog of it), compared to one in Cryptosporidium chipmunk genotype I (Cch_105.391, a paralog of cgd3_4260), one (cgd5/6_5520-5510 ortholog) in C. meleagridis, and none in C. hominis. The loss of these and some subtelomeric genes encoding secreted MEDLE family proteins in Cryptosporidium chipmunk genotype I (6, 2, 2, and 1 copy for C. parvum, C. meleagridis, Cryptosporidium chipmunk genotype I, and C. hominis, respectively) may contribute to its narrow host range. In contrast, the number of genes for mucin-type glycoproteins in Cryptosporidium chipmunk genotype I is similar to that in human-infecting species. Cryptosporidium chipmunk genotype I, C. hominis, C. parvum, and C. meleagridis possess 24 genes encoding mucin-type glycoproteins, whereas gastric species, such as C. andersoni and C. muris, have lost 16 of them, including those encoding gp60, Muc4, and Muc5, which are important in the attachment and invasion of C. parvum [28].
The significance of other gene gains and losses in the genome of Cryptosporidium chipmunk genotype I is not yet clear. The gene Cch_35.2955, which has three other paralogs in Cryptosporidium chipmunk genotype I, was annotated as a new gene at the 3′ end of chromosome 5. C. parvum has three orthologs (cgd5/6_5500, cgd6_5500 and cgd8_10) while C. hominis has six (Chro.00007, Chro.60010, Chro.60630, Chro.60631, Chro.60634 and Chro.80010). There is also a loss of the cgd8_660_670 ortholog in chromosome 8 of Cryptosporidium chipmunk genotype I. This gene encodes a large low complexity protein in C. parvum and has a paralog (cgd8_680_690) downstream. Likewise, C. hominis has only one member of this multigene family [11]. In addition, Cryptosporidium chipmunk genotype I has lost several other genes, such as orthologs of cgd4_3690 (encoding a large glycine-rich repeat low complexity protein), cgd4_4500 (encoding a cysteine-rich protein), cgd5_2960 (encoding a DEAD/DEAH box helicase), cgd5_2980 (encoding another DEAD/DEAH box helicase), and cgd8_4180 (encoding a glycine-rich low complexity protein) in C. parvum. Although the functions of these proteins are mostly unknown, these gene losses could contribute to the narrow host range of Cryptosporidium chipmunk genotype I.
Most of the highly divergent genes between Cryptosporidium chipmunk genotype I and other Cryptosporidium spp. encode secreted proteins and half of the highly divergent genes are located in the subtelomeric regions. These secreted proteins could potentially be SPDs in Cryptosporidium spp., thus play a role in host specificity of Cryptosporidium spp., especially SKSR, FLGN and mucin proteins. Among them, the number of genes encoding SKSR proteins is different between C. parvum IIa and IId subtype families, which have different host preference [13]. As in C. parvum IId subtype family, 7 paralogous genes encoding SKSR proteins were detected in Cryptosporidium chipmunk genotype I, but the sequence of these genes were divergent from those in C. parvum. The high sequence diversity of mucin-type glycoproteins between human-and animal-infecting species may also contribute to the host specificity and tissue tropism among Cryptosporidium spp. Previously, secretory proteins from dense granules (GRAs), micronemes (MICs), rhoptries (ROPs), and the SRS super-family were identified as potential SPDs in T. gondii, which could be responsible for differences in transmission modes, pathogenicity, and host range among T. gondii strains [29].
The elevated dN/dS ratios for secreted and surfaceassociated proteins support their function as SPDs. These proteins are apparently under selection, perhaps Fig. 6 Selective pressure in genes encoding major groups of proteins as indicated by the dN/dS ratios between Cryptosporidium chipmunk genotype I and C. parvum. Red categories represent groups of proteins with mean dN/dS ratios higher than all proteins in the proteome, while blue categories represent groups of proteins with reduced dN/ dS ratios. Triangles: mean dN/dS; horizontal black line: median dN/dS as a result of high immune pressure due to their importance in invasion and host-parasite interactions. A similar observation was made in comparative analysis of C. parvum and C. hominis genomes [30,31]. Most of the genes with higher dN/dS ratios are located in the subtelomeric regions, supporting the previous conclusion that they undergo more rapid evolution. Three genes encoding ABC transporters are among the top 20 genes with the highest dN/dS ratios between Cryptosporidium chipmunk genotype I and C. parvum. ABC transporters are "key components of the cellular machinery for endobiotic and xenobiotic detoxification", thus may contribute to intrinsic drug resistance in Cryptosporidium spp. [32]. These genes are expected to be under positive selective pressure. Indeed several ABC transporters were previously identified as highly divergent genes between C. parvum IIa (zoonotic) and IIc (anthroponotic) subtype families [33]. Interestingly, two of them, cgd2_80 and cgd2_90, are also within the same region (cgd2_70 and cgd2_90) identified as going through positive selection in the present study. These three ABC transporters encoded by genes within the ABC transporter gene cluster (cgd2_60 to cgd2_90) could be potential targets for drug development.

Conclusions
Cryptosporidium chipmunk genotype I apparently possesses metabolic pathways and invasion-related proteins similar to those in C. parvum, C. hominis, and C. meleagridis. This supports the human-pathogenic nature of Cryptosporidium chipmunk genotype I. The loss of two

Specimen collection and whole-genome sequencing
Cryptosporidium chipmunk genotype I isolate 37,763 was collected from one human specimen in Vermont and diagnosed by DNA sequence analysis of the small subunit rRNA gene [34]. Oocysts were purified from the specimen using sucrose and cesium chloride density gradient centrifugations and immunomagnetic separation [35]. The purified oocysts were subjected to five freeze-thaw cycles and overnight digestion with proteinase K. Genomic DNA was extracted from the oocysts by using the QIAamp®DNA Mini Kit (Qiagen Sciences, Maryland, 20,874, USA) and amplified by REPLI-g Midi Kit (Qiagen GmbH, Hilden, Germany). For whole-genome sequencing, 250-bp paired-end reads were generated from the DNA by using Illumina HiSeq 2500 analysis of an Illumina TruSeq (v3) library. After trimming for adapter sequences and poor sequence quality (<phred score less than 25), the sequence reads were assembled de novo by using CLC Genomics Workbench with word size of 63 and bulb size of 500. In a secondary analysis, the genome was also assembled using SPAdes 3.1 (http://cab.spbu.ru/software/spades/).

Genome structure analysis and gene prediction
An alignment of Cryptosporidium chipmunk genotype I genome and published genomes of C. parvum IOWA isolate [18], C. hominis, C. ubiquitum [12], C. baileyi [20] and C. andersoni [12] was constructed by using Mauve 2.3.1 [36] with default parameters. Circos 0.69 [37] was used to visualize the syntenic relationship (regions with orthologous genes) between the Cryptosporidium chipmunk genotype I genome and other four genomes. AUGUSTUS 3.2.1 [38], Geneid 1.4 [39], and GeneMark-ES [40] were used to predict proteinencoding genes in Cryptosporidium chipmunk genotype I with the default settings, after training AUGUSTUS and Geneid with the gene model of the C. parvum IOWA genome. Consensus predictor EVidence Modeler [41] was used to generate the gene set based on predictions from the three software packages.

Functional annotation
The predicted genes of Cryptosporidium chipmunk genotype I were annotated by using BLASTP [42] search of the GenBank NR database. Signal peptides and the transmembrane domains were predicted by using Sig-nalP 4.1 [43] and TMHMM 2.0 [44], respectively. GPI-SOM webserver [45] was used to identify proteins with GPI anchor sites. Metabolism analysis was performed using the web server KAAS [46] with the BBH (Bi-directional Best Hit) method and eukaryote gene model. The online databases KEGG (Kyoto Encyclopedia of Genes and Genomes)(http://www.genome.jp/kegg/), Pfam (http://pfam.xfam.org/) [47], and LAMP (Library of Apicomplexan Metabolic Pathways, release-2) [48] were used to annotate catalytic enzymes, functional proteins, and metabolic pathways within the genome.

Comparative genomics analysis
BLASTP was used for sequence similarity searches among Cryptosporidium chipmunk genotype I and other Cryptosporidium genomes in CryptoDB (http://cryptodb. org/cryptodb/). Homologous gene families were identified by using OrthoMCL [49]. BLASTP and OrthoMCL were run with e-value thresholds of 1e-3 and 1e-5, respectively. A Venn diagram of shared orthologs and species-specific genes of C. parvum, C. hominis, C. ubiquitum, C. meleagridis, and Cryptosporidium chipmunk genotype I was drawn using VennPainter (https://github. com/linguoliang/VennPainter). The relationship among proteins in Cryptosporidium chipmunk genotype I, C. parvum, and C. meleagridis was visualized with Gephi (https://gephi.org/) with the Fruchterman-Reingold layout based on the result of BLASTP homology analysis, with threshold of protein pairs sharing 30% identity over 100 amino acids. Comparative analyses of metabolism among Cryptosporidium spp. were based on the results of KAAS and data of LAMP. Pfam search results were used in comparisons of transporter proteins and invasion-related proteins among Cryptosporidium species. The nonsynonymous to synonymous substitution (dN/dS) ratios between Cryptosporidium chipmunk genotype I and C. parvum were calculated for orthologous genes using KaKs_Calculator 2.0 [50].

Phylogenetic analysis
The amino acid sequences of 100 single-copy orthologs shared among Cryptosporidium species and Gregarina niphandrodes were extracted and concatenated to construct a phylogenetic tree. MUSCLE [51] was used to align the concatenated sequences and with poorly aligned positions being eliminated from the alignment