Skip to main content

Near-chromosome level genome assembly reveals ploidy diversity and plasticity in the intestinal protozoan parasite Entamoeba histolytica



Amoebozoa is a eukaryotic supergroup composed of unicellular and multicellular amoebic protozoa (e.g. Acanthamoeba, Dictyostelium, and Entamoeba). They are model organisms for studies in cellular and evolutionary biology and are of medical and veterinary importance. Despite their importance, Amoebozoan genome organization and genetic diversity remain poorly studied due to a lack of high-quality reference genomes. The slime mold Dictyostelium discoideum is the only Amoebozoan species whose genome is available at the chromosome-level.


Here, we provide a near-chromosome-level assembly of the Entamoeba histolytica genome, the second semi-completed Amoebozoan genome. The availability of this improved genome allowed us to discover inter-strain heterogeneity in ploidy at the near-chromosome or sub-chromosome level among 11 clinical isolates and the reference strain. Furthermore, we observed ploidy-independent regulation of gene expression, contrary to what is observed in other organisms, where RNA levels are affected by ploidy.


Our findings offer new insights into Entamoeba chromosome organization, ploidy, transcriptional regulation, and inter-strain variation, which will help to further decipher observed spectrums of virulence, disease symptoms, and drug sensitivity of E. histolytica isolates.


Entamoeba histolytica is an intestinal protozoan parasite that causes human amebiasis. It is a major causative agent of diarrheal diseases, which was ranked fifth in the 2015 list of diseases responsible for high disability-adjusted life years (DALYs) [1]. E. histolytica resides in the large intestine, which represents an anaerobic environment. Thus, it has evolved unique core metabolism, mitochondrial structure and function, and cellular activities, such as compartmentalized sulfate activation in the mitochondrion-related organelle (mitosome) [2] and internalization of live mammalian cells by trogocytosis [3].

Due to its medical importance and biological peculiarities, the first draft genome of the E. histolytica HM-1:IMSS reference strain was reported in 2005 [4]. The first reported assembly was 23.8 Mb long with 12.5-fold coverage and was segmented into 888 scaffolds. In the latest public database [5], the assembly is segmented into 1496 scaffolds. Unfortunately, the genome structure predicted by the first draft was highly fragmented, due to the repetitive nature of the genome (e.g. short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), tRNA arrays containing short tandem repeats [6], segmental duplications, and polyploidy), which complicates chromosome-level genome characterization.


We assembled the E. histolytica HM-1:IMSS Clone 6 2001 genome ab initio with near-chromosome level resolution utilizing PacBio and Hi-C [7]. The assembly workflow is shown in Extended Data 1 (Additional file 1). Briefly, we obtained reads with ~ 409-fold coverage by PacBio SMRT sequencing and ~ 963-fold by Hi-C. After construction of several assemblies using HGAP3 [8], HGAP4, and Flye [9], HGAP3 was the most redundant and lossless assembly, representing 35 Mb fragmented into 403 contigs that were scaffolded with Hi-C reads. The near-chromosome level scaffolds were arranged by JuiceBox [10], and gaps were filled using PBjelly [11]. The resultant genome is composed of 38 scaffolds with a length of 26,879,087 bp. The genome contains a total of 8734 protein-coding genes according to prediction by Companion [12] without RNA-seq data. The number of scaffolds is comparable to that suggested by a previous study, in which pulse-field gel electrophoresis was exploited (31–35 chromosomes of over 300 kb) [13]. An overview of all scaffolds is shown in Fig. 1. A total of 1421 of the 1496 contigs from the previous genome assembly were mapped to the newly assembled genome by BLAST [14] (top hits were indicated with purple barcodes in the outer circle in Fig. 1). Among the remaining 75 contigs, 73 were very short (1 ~ 3 kb) and two were contamination of the extrachromosomal plasmid [15].

Fig. 1

Overview of the genome of E. histolytica HM-1:IMSS Clone 6 2001. This circos plot shows overall structures with length, annotations, and coverage graphs of PacBio and Illumina reads of the genome of E. histolytica HM-1:IMSS Clone 6 2001. From outside to inside, newly assembled scaffolds and their length (gray blocks; with a scale of 1:10 kb), the previous genome contigs (purple bar codes), coding sequences + strand (green bar codes), coding sequences - strand (orange bar codes), tRNA genes (black bar codes), single copy genes (orange bar codes), PacBio read coverage (a black line), and Illumina read coverage (a blue line) are shown

Using this newly assembled genome, we examined the chromosome organization and ploidy of E. histolytica. In order to better understand E. histolytica ploidy, we examined the allelic frequencies of SNPs across the whole genome. We collected and selected high probability SNPs from the HM-1:IMSS Clone 6 2001 genome using the Fixed Ploidy Variant Detection tool in CLC Genomics Workbench 8.5.1 (QIAGEN). The normalized SNP numbers were plotted against the allele frequencies, and a histogram shows three well-separated peaks at approximately 25, 50, and 75% (Fig. 2a, top panel). These data are consistent with the premise that the E. histolytica genome is tetraploid. We confirmed that other representative E. histolytica strains (NA22, NA53, and NA76) also show a similar distribution pattern of SNP allelic frequencies (Fig. 2a, Extended Data 2 (Additional file 1)), further supporting the claim [13].

Fig. 2

The ploidy patterns of E. histolytica HM-1:IMSS Clone 6 2015 mock-12w and NA76. a, Histograms of SNP allele frequencies (%) vs. normalized numbers of genes (proportions of the total genes) analyzed in HM-1:IMSS Clone 6 2015 mock-12w (upper panel) and NA76 (lower panel). They show peaks at 25, 50, and 75% of allele frequencies in both strains, suggesting that the genome is tetraploid. b, An example of the ploidy pattern in HM-1:IMSS Clone 6 2015 mock-12w, suggesting that the ploidy of E. histolytica is 4 at large, but fluctuates in a chromosomal and sub-chromosomal specific fashion. c, The magnified image of the rectangle in magenta in Fig. 2b. Scaffold 9 is highlighted with a cyan rectangle. Based on the read mapping, the first half of scaffold 9 is tetraploid, whereas the second half is pentaploid (indicated with a magenta arrow). d, A schematic model of the ploidy of scaffold 9. There are four identical chromosomes of full length and one partial (cropped) chromosome corresponding to the second half of scaffold 9. e, The comparison of ploidy patterns among three isogenic strains: HM-1:IMSS Clone 6 2001, 2015 mock-12w, and 2015 mock-33w. The ploidy patterns altered during 15 years of cultivation (magenta arrows), but the tetraploidy is kept at large. On the other hand, only slight difference was observed after 21 additional weeks of cultivation after transfection of a mock empty plasmid in the presence of G418 (an cyan arrow)

We further investigated if ploidy varies at the chromosomal and/or sub-chromosomal levels. We mapped the Illumina reads of the HM-1:IMSS Clone 6 2001 assembly to the new E. histolytica genome assembled herein. We found that the coverage varies between scaffolds and between different regions on the same scaffold (Fig. 2b and c). The relative level of coverage indicates the number of homologous chromosomes (or chromosomal regions) that contain the gene. We normalized the whole coverage to be four, based on the assumption that the genome is tetraploid according to previous studies and as discussed above [13]. Accordingly, some scaffolds and some regions of scaffolds occasionally show a relative coverage of five, six, or seven. These results indicate that the genome is aneuploid; it is tetraploid in general, but is occasionally > 4-ploid at the chromosomal and sub-chromosomal levels and contains one or more additional copies of the shorter truncated chromosome, as shown in the example in Fig. 2d.

We investigated if the ploidy pattern is stable during in vitro cultivation. We compared the ploidy pattern between HM-1:IMSS Clone 6 2001 and HM-1:IMSS Clone 6 2015 mock-12w, the latter of which was derived from HM-1:IMSS Clone 6 2001, maintained in vitro in BI-S-33 medium for 15 years [16], transfected with an empty vector (pEhTex) [17, 18], and cultivated in vitro in the presence of 6 μg/mL G418 and 10 μg/mL tetracycline. The data show subtle but significant differences in the ploidy pattern between these two isogenic strains. We also compared the ploidy pattern between HM-1:IMSS Clone 6 2015 mock-12w and HM-1:IMSS Clone 6 2015 mock-33w, the latter of which was cultured for an additional 21 weeks in BI-S-33 medium containing G418 and tetracycline. Slight alterations to the ploidy pattern were found after the additional 21-week cultivation (Fig. 2e).

We next examined the ploidy patterns of 11 E. histolytica clinical strains isolated from patients with different clinical manifestations [Extended Data 4 (Additional file 1)]. Surprisingly, all clinical strains show distinct ploidy patterns which seem to be symptom-independent (Fig. 3).

Fig. 3

Ploidy patterns of 11 E. histolytica clinical isolates. The ploidy patterns of clinical isolates. The patterns of asymptomatic strains (NA11, NA53, and NA77), colitis strains (NA21, NA22, NA51, NA60, NA62, NA75, and NA76), and a liver abscess strain (NA19) are shown in green, blue and red rectangle, respectively. The pattern of HM-1:IMSS Clone 6 2015 mock-12w is also shown. All strains show different ploidy patterns, suggesting no direct correlation between the ploidy patterns and the clinical symptoms

Finally, we examined if ploidy patterns are associated with differential gene expression. Specifically, we attempted to determine whether all or only a subset of the homologous chromosomes is transcriptionally active. To test this, we compared RNA-seq data with genome mapping data from the NA11 and NA19 representative clinical strains. We chose scaffold 15 because its ploidy pattern is unstable and is thus appropriate for comparison. Genome mapping data indicate that in the NA11 strain, the first half of scaffold 15 is septaploid (7x) while the second half is tetraploid. However, in the NA19 strain, scaffold 15 is homogeneously tetraploid (Fig. 4a). We compared the RPKM values of all genes encoded in the first and second halves of scaffold 15 between the NA11 and NA19 strains (Fig. 4b). Interestingly, the observed variations in ploidy do not correlate with mRNA levels (Fig. 4b).

Fig. 4

Lack of correlation between ploidy and RNA level. a, The ploidy patterns of scaffold 15 (indicated by an magenta arrow) of NA11 and NA19. In NA11, the first half of scaffold 15 is septaploid, whereas the second half is tetraploid (left). In NA19, the entire scaffold 15 is tetraploid (middle). The magnified image of the corresponding area of NA11 in the left panel is shown in the right panel. b, Plots showing a linear association of the relative abundance of RNA, shown in RPKM values, of the genes located in the first (left) and second (right) halves of scaffold 15 between NA11 and NA19


Here, we have provided the improved reference genome of E. histolytica HM-1:IMSS Clone 6 2001 using PacBio and Hi-C. According to assembly statistics and BLAST barcoding analysis, our assembly appears to have significantly improved quality compared to the previously genome [4]. The new reference genome should help in the understanding of genome architecture, synteny between strains and species, SNPs discovery, and all downstream research.

In this study, we did not extract and analyze SNPs across strains. This decision was made based on our observation that almost all SNPs were located outside of open reading frames (ORFs) and no heterozygosities were found in ORFs, despite high gene density. The fact suggests the existence of strict selection pressures on the coding sequences in contrast to the plasticity of the intergenic regions. Nevertheless, our assembly enabled us to depict the diversity in the ploidy among E. histolytica strains. Ploidy varies widely between species, between cells, tissues, and organs of a given species, and also sometimes between individuals. The biological implications of ploidy are also diverse. It is well established that ploidy abnormalities can cause death or severe disorder in multicellular organisms. It is also known that changes to ploidy may act as a strong force driving evolution [19]. Amoebozoa is an interesting eukaryotic superfamily containing organisms with normal to extreme variations in ploidy: e.g. Acanthamoeba castellanii (< 25-ploid) [20], Amoeba proteus (500 ~ 1000-ploid) [21], and Polychaos dubium (1150 ~ 23,000-ploid) [21]. The ploidy of E. histolytica is variable because their chromatin does not condense in all stages of its life cycle. However, it was presumed, based on pulse-field gel electrophoresis and Southern hybridization using single copy gene probes, that E. histolytica had at most four sets of homologous chromosomes that may vary in length among strains [13].

Variances in ploidy are observed in cancer cells, which contained mutations in several key genes involved in cell cycle control and chromosome segregation [22]. It was also reported that Leishmania, another group of protozoa responsible for visceral and (muco)cutaneous leishmaniasis, possesses such chromosomal mosaicism which is likely caused by errors in chromosome replication [23]. It is worth noting that the E. histolytica genome lacks some key genes involved in cell cycle control and chromosome segregation (e.g. components of APC/C, cohesins, condensins, and kinetochore; Extended Data 3 (Additional file 1)). Of importance are the APC/C regulator Rae1 and Nup98, whose loss causes severe aneuploidy in mice [24]. Rae1 and Nup98 seem to be functionally lost from the E. histolytica genome according to BLAST e-values [Extended Data 3 (Additional file 1)].

Another important discovery of our study was that ploidy patterns are apparently unstable. Our data suggest that ploidy patterns at the near-chromosome level change during in vitro cultivation, while tetraploidy is maintained in general. Ploidy patterns may change upon adaptation to different environmental conditions such as hosts, organs, drugs, and intestinal microbiota. Alternatively, variations in ploidy patterns may simply reflect the genomic heterogeneity of the clinical strains. These data argue against the possibility that ploidy patterns are directly related to virulence in humans.

We observed no relationships between ploidy and expression levels in E. histolytica. This is in stark contrast to cancer cells and Leishmania, where DNA content and mRNA expression of some genes were correlated [23]. These data suggest that not all homologous chromosomes (or loci) may not be transcriptionally active in E. histolytica. Alternatively, one possible explanation for the lack of correlation between the ploidy number and the steady-state transcript levels is that while genes on homologous chromosomes are unbiasedly transcribed, there are some feedback mechanisms such as (1) RNA destabilization and degradation or (2) transcriptional repression via transcriptional gene silencing with antisense small RNA, as demonstrated in this organism [25]


In summary, we significantly improved the currently available E. histolytica reference genome by combining sequencing data from three NGS platforms. We demonstrated that the ploidy of isogenic strains is dynamic upon in vitro cultivation, and that ploidy patterns also vary among clinical strains. We also uncovered a mechanism that uncouples ploidy and gene expression level and leads to variations in genome structure and ploidy patterns, which may underly variation in parasite virulence. These findings have implications in the virulence spectrum and drug sensitivities observed among clinical parasite strains that cause a range of disease manifestations.


Organisms, cultivation, and DNA extraction

Trophozoites of isogenic E. histolytica strains (HM-1:IMSS Clone 6 2001, 2015 mock-12w, and 2015 mock-33w) were cultured in BI-S-33 medium [15]. A total of 11 clinical strains were obtained from outpatients were diagnosed for amebiasis at the AIDS Clinical Center (ACC) of the National Center for Global Health and Medicine (NCGM) in 2014–2017. The details regarding the culture medium, date of isolation, clinical manifestations of the patients and additional descriptions, as well as the genome sequencing platforms used are shown in Extended Data 4 (Additional file 1).

Illumina data generation

Genomic DNA was extracted from approximately 10 [7] cells of E. histolytica trophozoites using the QIAGEN Blood & Cell Culture DNA Kit with Genomic-tip 100/G (QIAGEN GmbH, Germany). Nuclei were isolated according to the manufacturer’s instructions (QIAGEN Genomic DNA Handbook). Purity and quantity of DNA samples were estimated using the Qubit dsDNA HS Assay Kit (Thermo-Fisher Scientific, Massachusetts, US).

PacBio data generation

A total of 30.3 μg of genomic DNA was extracted from approximately 3.0 × 10 [8] cells from E. histolytica HM-1 Clone 6 2001 trophozoites using the QIAGEN Blood & Cell Culture DNA Kit with Genomic-tip 100/G (Qiagen GmbH, Germany). The integrity of high molecular weight DNA was confirmed by agarose-gel electrophoresis. DNA templates of an average size > 20 kb were produced using the BluePippin Size-Selection System Template Prep Kit and the SMRTbell Template Prep Kit 1.0. The DNA templates were sequenced on 9 SMRT cells of the PacBio RS II (Pacific Biosciences, California, US) by the Okinawa Institute of Advanced Sciences (Okinawa, Japan). The SMRTbell adapters were removed from raw reads and filtered for quality by the institute.

Hi-C data generation

A total of 1.2 × 10 [8] log-phase trophozoites were collected by centrifugation at 500 x g for 5 min at 4 °C and washed 3 times with phosphate buffered saline (PBS). Cells were resuspended in 10 mL of PBS and mixed with 40 mL of buffer C1 (QIAGEN) in a 50 mL plastic conical tube. The tube was gently inverted 8 times and incubated on ice for 10 min. After the incubation, nuclei were collected by centrifugation at 1300 x g for 10 min at 4 °C, followed by resuspension in 10 mL of PBS. Isolated nuclei were crosslinked with 1% formaldehyde at room temperature (RT) for 15 min, with intermittent inversion of the tube. Excess formaldehyde was quenched by incubation with a final concentration of 1% (w/v) glycine for 15 min at RT by adding glycine powder. Fixed nuclei were washed with PBS, collected by centrifugation at 1500–2000 x g for 10 min at 4 °C, and stored frozen at − 30 °C until use. The frozen nuclei were analyzed by the commercial scaffolding service, Proximo Hi-C Genome Scaffolding, provided by Phase Genomics, Inc. (Seattle, Washington, US). Their draft scaffolds and raw reads were used for latter analyses.

Assembly of the HM-1:IMSS clone 6 2001 reference genome

The workflow for genome assembly is shown in a flow diagram [Extended Data 1 (Additional file 1)]. Briefly, genomic DNA extracted from nuclei was sequenced by PacBio RSII, and the reads were assembled using three assemblers: HGAP3, HGAP4, and Flye. For downstream analyses, we used the assembly created by HGAP3 because it was thought to be the most redundant and lossless assembly. The assembly was scaffolded using Hi-C linked reads by the Proximo Genome Scaffolding service. After scaffolding, we filled in gaps in the assembly using PBjelly. The output was manually corrected using Juicer and Juicebox, and gap filling was performed again. Finally, 38 scaffolds and one circular plasmid were assembled. The genome was temporally annotated by Companion.

SNP calling

Using the new assembly as a reference sequence, we ran the Fixed Ploidy Variant Detection tool in CLC Genomics Workbench 8.5.1. Ploidy was fixed to four according to our results and a previous study [8]. The program returns the list of heterozygous SNPs with index values of probabilities and reliabilities (QUAL). After filtering the results for QUAL = 200 (maximum), we manually verified the SNPs using readmapping. Then, the allele frequencies of those SNPs were drawn as bar-charts as shown in Fig. 2a and Extended data 2 (Additional file 1).

Availability of data and materials

The genome assembly and sequence data for HM-1:IMSS Clone 6 2001 and other strains in this study were deposited at DNA Data Bank of Japan (DDBJ) under AP023109-AP023147, PRJDB8679, and PRJDB8495.

Change history

  • 24 January 2021

    The original article was missing its Extended Data files as Additional files. The article has been updated to include the missing files.



Ribonucleic acid


Disability-adjusted life years


Short interspersed nuclear elements


Long interspersed nuclear elements


Single molecule real-time sequencing


Basic local alignment search tool


Single nucleotide polymorphism


Open reading frames


Anaphase promoting complex/cyclosome


Reads per kilobase of exon per million mapped sequence reads


Deoxyribonucleic acid


Next-generation sequencing


AIDS Clinical Center


National Center for Global Health and Medicine


Phosphate buffered saline


Room temperature


DNA Data Bank of Japan


National Institute of Infectious Diseases


Japan Agency for Medical Research and Development


Japan Initiative for Global Research Network on Infectious Diseases


Science and Technology Research Partnership for Sustainable Development


Japan International Cooperation Agency


  1. 1.

    Global Health Estimates 2015. Burden of disease by cause, age, sex, by country and by region, 2000–2015. Geneva: World Health Organization; 2016.

    Google Scholar 

  2. 2.

    Tovar J, Fischer A, Clark CG. The mitosome, a novel organelle related to mitochondria in the amitochondrial parasite Entamoeba histolytica. Mol Microbiol. 1999;32:1013–21.

    CAS  Article  Google Scholar 

  3. 3.

    Ralston KS, Solga MD, Mackey-Lawrence NM. Somlata, Bhattacharya a, petri WA Jr. Trogocytosis by Entamoeba histolytica contributes to cell killing and tissue invasion. Nature. 2014;508:526–30.

    CAS  Article  Google Scholar 

  4. 4.

    Loftus B, Anderson I, Davies R, Alsmark UC, Samuelson J, Amedeo P, Roncaglia P, Berriman M, Hirt RP, Mann BJ, et al. The genome of the protist parasite Entamoeba histolytica. Nature. 2005;433:865–8.

    CAS  Article  Google Scholar 

  5. 5.

    Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Caler EV, Fischer S, Gajria B, Gao X, Gingle A, Grant G, et al. AmoebaDB and MicrosporidiaDB: functional genomic resources for Amoebozoa and Microsporidia species. Nucl Acids Res. 2011;39:1–8.

    Article  Google Scholar 

  6. 6.

    Clark CG, Ali IKM, Zaki M, Loftus BJ, Hall N. Unique organisation of tRNA genes in Entamoeba histolytica. Mol Biochem Parasitol. 2006;146:24–9.

    CAS  Article  Google Scholar 

  7. 7.

    Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–93.

    CAS  Article  Google Scholar 

  8. 8.

    Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10:563–9.

    CAS  Article  Google Scholar 

  9. 9.

    Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long error-prone reads using repeat graphs. Nat Biotech. 2019;37:540–6.

    CAS  Article  Google Scholar 

  10. 10.

    Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Lieberman-Aiden E. Juicebox provides a visualization system for hi-C contact maps with unlimited zoom. Cell Systems. 2016;3(1):95–8.

    CAS  Article  Google Scholar 

  11. 11.

    English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC, et al. Mind the gap: upgrading genomes with Pacific biosciences RS long-read sequencing technology. PLoS One. 2012;7(11):e47768.

    CAS  Article  Google Scholar 

  12. 12.

    Steinbiss S, Silva-Franco F, Brunk B, Foth B, Hertz-Fowler C, Berriman M, Otto TD. Companion: a web server for annotation and analysis of parasite genomes. Nuc Acid Res. 2016;44:W29–34.

    CAS  Article  Google Scholar 

  13. 13.

    Willhoeft U, Tannich E. The electrophoretic karyotype of Entamoeba histolytica. Mol Biochem Parasitol. 1999;99:41–53.

    CAS  Article  Google Scholar 

  14. 14.

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.

    CAS  Article  Google Scholar 

  15. 15.

    Sehgal D, Mittal V, Ramachandran S, Dhar SK, Bhattacharya A, Bhattacharya S. Nucleotide sequence organisation and analysis of the nuclear ribosomal DNA circle of the protozoan parasite Entamoeba histolytica. Mol Biochem Parasitol. 1994;67:205–14.

    CAS  Article  Google Scholar 

  16. 16.

    Diamond LS, Harlow DR, Cunnick CC. A new medium for the axenic cultivation of Entamoeba histolytica and other Entamoeba. Trans R Soc Trop Med Hyg. 1978;72:431–2.

    CAS  Article  Google Scholar 

  17. 17.

    Felgner PL, Gadek TR, Holm M, Roman R, Chan HW, Wenz M, Northrop JP, Ringold GM, Danielsen M. Lipofection: a highly efficient, lipid-mediated DNA-transfection procedure. Biochemistry. 1987;84:7413–7.

    CAS  Google Scholar 

  18. 18.

    Makiuchi T, Santos HJ, Tachibana H, Nozaki T. Hetero-oligomer of dynamin-related proteins participates in the fission of highly divergent mitochondria from Entamoeba histolytica. Sci Rep. 2017;7(1):13439.

    Article  Google Scholar 

  19. 19.

    Aleeza CG, Sarah PO. Ploidy and the causes of genomic evolution. J Hered. 2009;100:571–81.

    Article  Google Scholar 

  20. 20.

    Byers TJ. Molecular biology of DNA in Acanthamoeba, Amoeba, Entamoeba and Naegleria. Int Rev Cytol. 1986;99:311–41.

    CAS  Article  Google Scholar 

  21. 21.

    Friz CT. The biochemical composition of the free living amoebae Chaos chaos, Amoeba dubia and Amoeba proteus. Comp Biochem Physiol. 1968;26:81–90.

    CAS  Article  Google Scholar 

  22. 22.

    Duijf PHG, Nanayakkara D, Nones K, Srihari S, Kalimutho M, Khanna KK. Mechanisms of genomic instability in breast Cancer. Trends Mol Med. 2019;25:595–611.

    CAS  Article  Google Scholar 

  23. 23.

    Dumetz F, Imamura H, Sanders M, Seblova V, Myskova J, Pescher P, Vanaerschot M, Meehan CJ, Cuypers B, De Muylder G, et al. Modulation of aneuploidy in Leishmania donovani during adaptation to different in vitro and in vivo environments and its impact on gene expression. MBio. 2017;8:e00599–17.

    CAS  Article  Google Scholar 

  24. 24.

    Jeganathan KB, Malureanu L, van Deursen JM. The Rae1-Nup98 complex prevents aneuploidy by inhibiting securin degradation. Nature. 2005;438:1036–9.

    CAS  Article  Google Scholar 

  25. 25.

    Morf L, Pearson RJ, Wang AS, Singh U. Robust gene silencing mediated by antisense small RNAs in the pathogenic protist Entamoeba histolytica. Nucleic Acids Res. 2013;41:9424–37.

    CAS  Article  Google Scholar 

Download references


We thank Drs. Makoto Kuroda and Tsuyoshi Sekizuka, Pathogen Genomics Center, NIID for discussions and Ms. Kumiko Shibata for technical assistance on cultivation.


This study is supported in part by Grant for research on emerging and re-emerging infectious diseases from Japan Agency for Medical Research and Development (AMED, JP18fk0108046 to TN and 19fk0108049j1103 to KNT), Grant for research on Japan Initiative for Global Research Network on Infectious Diseases (J-GRID) from AMED (JP17fm0108017 to TN), Grants-in-Aid for Scientific Research (B) and Challenging Research (Exploratory) (KAKENHI JP26293093, JP17K19416, and JP18H02650 to TN) from the Japan Society for the Promotion of Science, Grant from the National Center for Global Health and Medicine Intramural Research Fund (29–2013 to TN), and Grant for Science and Technology Research Partnership for Sustainable Development (SATREPS) from AMED and Japan International Cooperation Agency (JICA) (TN).

Author information




TKS and SI analyzed and interpreted the genome data. TKS and TN wrote the manuscript. YY, YSN, KW, SK, and KNT established the E. histolytica strains and sequenced genomes. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tomoyoshi Nozaki.

Ethics declarations

Ethics approval and consent to participate

Written informed consent was obtained from all participants Entamoeba histolytica clinical strains were isolated from. This study was approved by the ethics committee of the National Center for Global Health and Medicine Center (approval no. NCGM-G-001566-02). This study was implemented in accordance with the provisions of the Declaration of Helsinki.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Extended Data 1-4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kawano-Sugaya, T., Izumiyama, S., Yanagawa, Y. et al. Near-chromosome level genome assembly reveals ploidy diversity and plasticity in the intestinal protozoan parasite Entamoeba histolytica. BMC Genomics 21, 813 (2020).

Download citation


  • Entamoeba
  • Aneuploidy
  • Expression regulation
  • Hi-C
  • PacBio