- Research article
- Open Access
Genomic distribution of SINEs in Entamoeba histolytica strains: implication for genotyping
BMC Genomics volume 14, Article number: 432 (2013)
The major clinical manifestations of Entamoeba histolytica infection include amebic colitis and liver abscess. However the majority of infections remain asymptomatic. Earlier reports have shown that some E. histolytica isolates are more virulent than others, suggesting that virulence may be linked to genotype. Here we have looked at the genomic distribution of the retrotransposable short interspersed nuclear elements EhSINE1 and EhSINE2. Due to their mobile nature, some EhSINE copies may occupy different genomic locations among isolates of E. histolytica possibly affecting adjacent gene expression; this variability in location can be exploited to differentiate strains.
We have looked for EhSINE1- and EhSINE2-occupied loci in the genome sequence of Entamoeba histolytica HM-1:IMSS and searched for homologous loci in other strains to determine the insertion status of these elements. A total of 393 EhSINE1 and 119 EhSINE2 loci were analyzed in the available sequenced strains (Rahman, DS4-868, HM1:CA, KU48, KU50, KU27 and MS96-3382. Seventeen loci (13 EhSINE1 and 4 EhSINE2) were identified where a EhSINE1/EhSINE2 sequence was missing from the corresponding locus of other strains. Most of these loci were unoccupied in more than one strain. Some of the loci were analyzed experimentally for SINE occupancy using DNA from strain Rahman. These data helped to correctly assemble the nucleotide sequence at three loci in Rahman. SINE occupancy was also checked at these three loci in 7 other axenically cultivated E. histolytica strains and 16 clinical isolates. Each locus gave a single, specific amplicon with the primer sets used, making this a suitable method for strain typing. Based on presence/absence of SINE and amplification with locus-specific primers, the 23 strains could be divided into eleven genotypes. The results obtained by our method correlated with the data from other typing methods. We also report a bioinformatic analysis of EhSINE2 copies.
Our results reveal several loci with extensive polymorphism of SINE occupancy among different strains of E. histolytica and prove the principle that the genomic distribution of SINEs is a valid method for typing of E. histolytica strains.
Entamoeba histolytica, the etiological agent of amoebiasis, is a protistan parasite that lives in the human intestine. Amoebiasis is the third leading cause of death due to parasitic disease . According to the WHO, about 40–50 million people are infected annually causing approximately 100,000 deaths worldwide. About 90% of the infections with this parasite remain asymptomatic .What leads to the varied outcome of infection is not known, but it is possible that the genotype of the strain influences the outcome . The suggestion has been made that inherently avirulent strains exist that may be associated with unique genotypes . The E. histolytica strain Rahman is considered to be avirulent in axenic culture since it shows reduced cytopathic activity on epithelial cells and does not form liver abscesses in animal models [5, 6]. Data are, however, insufficient to assign virulence properties to specific genotypes of E. histolytica.
Retrotransposons without long terminal repeats are generally called long interspersed nuclear elements (LINEs) and their short non autonomous partners are called SINEs . LINEs are generally ~5 kb in length and encode the functions required for retrotransposition, while SINEs are short and do not code for proteins. They utilize the LINE-encoded proteins for their own retrotransposition. Both LINEs and SINEs are efficient genome invaders and are widespread in eukaryotes . In E. histolytica the EhLINEs (4.8 kb) and EhSINEs (0.5 to 0.7 kb) constitute 11.2% of the genome . They belong to three closely related families, of which EhLINE1/EhSINE1 are the most abundant. These elements are present mostly in the intergenic regions [10, 11], with a T- rich sequence within 50 bp upstream of the site of insertion [10, 12]. Due to their mobile nature they can occupy different genomic locations and may influence the phenotype of the organism by activating or silencing the genes in their vicinity. Previous work has shown that a number of SINE1 occupied sites in E. histolytica are unoccupied in the non pathogenic species Entamoeba dispar and vice versa [11, 13, 14] which may have important consequences for the pathogenicity of the parasite.
A number of studies in different organisms have utilized SINEs as useful markers for phylogeny . It has been argued that SINE insertion analysis is one of the best methods for determining relationships of closely related species since SINEs are widely dispersed in the genome and, unlike DNA transposons, there is no evidence of any process that removes SINEs from the genome once they are inserted. Nonspecific SINE deletions due to unequal crossing over are relatively rare. Thus the absence of a SINE at a particular locus signifies the ancestral state. The probability of independent insertions at the same locus is exceedingly low, which links SINE-containing loci as related by descent [16, 17]. For these reasons population genetic analysis can be performed more accurately with SINEs than with RFLPs and microsatellite loci (where the same allele may be shared by two individuals by chance). Here we have explored the possibility of using EhSINE insertions as strain-specific markers.
Several methods have been developed for the genotyping of this parasite [18–24], which have their individual limitations. Polymorphisms are observed in short tandem repeat numbers, and repeat sequences present in the genes encoding chitinase and the surface antigen SREHP, as well as in the arrays of tRNA genes of E. histolytica. These have been utilized successfully for strain identification [25, 26]. However the size variation in most of these loci is small, sometimes making it difficult to detect polymorphism by agarose gel electrophoresis, so DNA sequencing is normally used for confirmation. A transposon display technique was also devised for strain identification based on the genomic distribution of EhSINE1 . However, this method is not suitable for use with clinical isolates.
Here we analysed 393 EhSINE1 and 119 EhSINE2 loci present in the HM-1:IMSS strain of E. histolytica for insertion polymorphism in other sequenced strains (http://www.Amoebadb.org) [28, 29]. Seventeen loci were found (13 for EhSINE1 and 4 for EhSINE2) that showed insertion polymorphism. Of these, six loci were validated experimentally in strain Rahman. Three of these loci were tested in 7 other axenically grown strains and 16 clinical isolates. Each of the loci gave a single specific amplicon with the primer sets used, making this a suitable method for genotyping. We also report a bioinformatic analysis of EhSINE2 elements.
Analysis of polymorphic loci
The E. histolytica HM-1:IMSS genome sequence is available in 1529 scaffolds as the full genome could not be assembled into chromosomes. The sequences were downloaded from NCBI [accession number AAFB00000000]. Different strains of E. histolytica, namely HM1:CA, DS4-868, KU27, KU48, KU50, MS96-3382 and Rahman were downloaded from AmoebaDB (http://www.amoebadb.org) . These are partially assembled sequences obtained using next generation sequencing technologies.
Table 1 shows statistics of the genome sequences used in the study. A database of EhSINE1 elements was built based on the results generated by Huntley et al.. A total of 393 EhSINE1 elements were included. Elements that were less than 450 bp were omitted. Flanking sequences of 1000 bp from both 5′- and 3′-ends of all EhSINE1 elements were extracted using a perl code. The flanking sequences were mapped separately to the contigs of the various strains of E. histolytica using BLAST  and only when both flanking sequences of a specific SINE element mapped to a single contig was it used for further analysis. Presence of EhSINE1 was scored when the distance between the flanking sequences in the target strain was found to be greater than or equal to 450 bp. On the other hand, if the distance between the flanking pairs was less than or equal to 100 bp then the SINE was considered to be missing. All results were validated by manual inspection. Similarly all the EhSINE2 copies having a length greater than 400 bp and similarity of more than 70% with the EhSINE2 consensus sequence  were extracted from the E. histolytica HM-1:IMSS genome. This resulted in 119 EhSINE2 copies, which were analysed for their locus occupancy in the various sequenced strains.
Axenic and xenic cultivation of E. histolytica- Axenic strains HM-1:IMSS and Rahman were maintained by continuous subculturing in TYI-S-33 medium , and the rest of the axenic strains were maintained in LYI-S-2 medium . Xenic strains were maintained by continuous subculturing in Robinson′s medium .
Genomic DNA isolation- Genomic DNA of axenic and xenic E. histolytica strains was isolated using a genomic DNA isolation kit (Promega, USA) and the QIAamp® DNA Mini Kit (Qiagen, Germany), respectively, according to the manufacturer’s instructions.
Polymerase chain reaction (PCR) - Primers were designed from the flanking sequences of different EhSINE1 copies obtained from the E. histolytica HM-1:IMSS database (Additional file 1: Figure S1). All PCR reactions were performed with Biotools DNA polymerase (Biotools, B&M Labs, Spain); the PCR programme consisted of initial denaturation for 5 min at 94°C followed by 30 cycles of 30 sec at 94°C, annealing for 30 sec at a temperature dependent on the Tm of the primers used, and an extension time at 72°C dependent on the size of amplicon. Products were resolved on a 1% agarose gel (USB, Spain) containing 0.5 μg/ ml of ethidium bromide using 0.5X TBE (Tris borate EDTA pH8) buffer.
Southern blotting and hybridization- DNA was transferred to HYbond™-N + Nylon membrane (GE Healthcare) using standard methods . Labeled probes were prepared using α-32P–dATP by the random priming method using the NEBlot(R) kit (NEB, USA) according to the manufacturer’s instructions. Blots were hybridized overnight with probe at 65°C in a solution of 1% SDS, 1 M NaCl and 100 μg/ml of salmon sperm DNA, washed to remove nonspecific probe, exposed (Fujifilm) and scanned by phosphorimager.
DNA sequencing- Amplicons were extracted from agarose gels using a gel extraction kit (Qiagen) and cloned into the pGEM-T vector (Promega, USA). Sequences were generated commercially (TCGA, India) and compared using ClustalW software (Bioedit).
Analysis of Target Site duplication (TSD) and internal repeats (IRs) using MEME- The online tool MEME  was used for the analysis of TSDs and IRs of SINE2. 50 bp of sequence upstream and downstream of the EhSINE2 were extracted from the E. histolytica HM-1:IMSS genome and these were analysed for TSD. Since the longest TSDs found were in the range of 16–20 bp, and some of the shorter TSDs may result from accumulation of mutations in older SINE insertions, TSDs having size < 8 bp were excluded. The input consisted of 79 FASTA formatted sequences of TSDs with the default settings of width (Minimum 6 and Maximum 50) and the search was optimized for identifying zero or one motif per sequence. For IR analysis 150 sequences were subjected to MEME analysis in a similar way.
Results and discussion
Identification of genomic loci with differential EhSINE1/EhSINE2 occupancy in the sequenced E. histolyticastrains
The availability of genome sequences of a number of E. histolytica strains is likely to help define the level of polymorphism in SINE distribution in E. histolytica. EhSINE1 (445 copies) and EhSINE2 (256 copies) constitute the majority of the SINE population of E. histolytica. There are only 49 copies of EhSINE3 , therefore we focused only on EhSINE1 and EhSINE2 for this study. Out of 445 copies of EhSINE1, 393 are full-length (>450 bp) , and only full length copies were used for analysis. We performed a similar analysis with EhSINE2 and found 119 full length copies (length >400 bp and similarity >70% with the EhSINE2 consensus) in strain HM-1:IMSS.
Insertion polymorphism of EhSINEs 1 and 2 was detected by comparing the genomic location of all full length copies in strain HM-1:IMSS with the same loci in strain Rahman (which has lost virulence in axenic culture). Flanking sequences surrounding each SINE (1 kb from both sides) were taken into consideration in identifying the SINE-containing loci. An element was considered to be present when along with SINE the flanking sequences were the same in the two strains. The results of this analysis are presented in Figures 1 and 2. Out of 393 full length EhSINE1 copies it was possible to do this analysis for only 270 due to an inability to extract one of the flanking sequences for the rest, because either the SINE was present at the end of the scaffold or was flanked by repetitive sequences (Figure 1). Further, out of these 270 copies, full length EhSINE1 copies could be clearly mapped in Rahman in only 114 cases; in others this was not possible as the upstream and downstream sequences were in different scaffolds of Rahman. Additionally, we did not consider 42 EhSINE1 loci as there were undefined nucleotides at many positions. Finally, we found 4 loci where the flanking sequences in strains HM-1:IMSS and Rahman were conserved but the EhSINE1 sequences were completely missing in Rahman, as against 114 loci where EhSINE1 was present in both strains.
Similarly, out of the 119 full-length copies of EhSINE2 it was possible to use only 69 copies for our analysis (Figure 2), and only 2 unoccupied sites were identified in Rahman following the criteria described for EhSINE1. Since the total number of unoccupied sites obtained was rather small (4 out of 270 for EhSINE1, and 2 out of 119 for EhSINE2), we checked to see if we were missing some polymorphic loci in the copies that could not be computationally analyzed. PCR primers were designed using the genes flanking a number of EhSINE1 loci in HM-1:IMSS and were used to amplify the same loci from genomic DNA of strain Rahman. A total of 159 loci were tested from the various categories listed in Figure 1. Of these, the amplicon size in Rahman was identical with HM-1:IMSS at 157 loci, showing that these loci were all occupied, while at the remaining two loci (17 and 19) the EhSINE1 was absent from Rahman. Locus 17 was missed in the computational analysis because the sequence of the SINE, and some sequence upstream of it, contained undefined nucleotides in Rahman. In the case of locus 19 the corresponding sequence was located in three different contigs in Rahman. Therefore the combined experimental and computational analysis allowed us to identify 6 EhSINE1 loci that are polymorphic between strains HM-1:IMSS and Rahman.
A number of E. histolytica strains (DS4-868, KU27, KU48, KU50, MS96-3382, HM1:CA), for which Next Generation Sequencing (NGS) data are currently available, were analyzed using the approach described above. Since NGS output is in the form of short sequence reads which are assembled into a large number of scaffolds, it is likely that a number of polymorphic sites were missed in this analysis. A total of 17 polymorphic loci (13 EhSINE1 loci and 4 EhSINE2 loci) were found (Table 2). Out of the 17, 9 loci were polymorphic in more than one strain. The results suggest that SINE insertion polymorphism is widespread among strains and isolates of E. histolytica. Analysis of sequence in the database at sites where the SINEs were scored absent showed that in some cases a small fragment of the SINE sequence was still present, and in some others a part of the flanking sequence was missing (Table 2). We cross-checked this by sequencing some of these loci in Rahman and present evidence below that there was actually no SINE sequence left at these loci, and the reported sequence in the database was erroneous. Such assembly errors may be expected when dealing with highly repetitive sequences. We have not cross-checked all the loci and cannot comment on the status of these.
Of the eight predicted polymorphic loci in strain Rahman we validated experimentally six using PCR (Figures 3 and 4) with primers designed from the flanking sequences of EhSINE1/EhSINE2 in HM-1:IMSS (Additional file 1: Figure S1 and Additional file 2: Table S1). The absence of SINE sequences was inferred from the size of the amplicon (smaller by the size of SINE) and by Southern hybridization using a SINE sequence as a probe. The amplicon sizes in Rahman from three EhSINE1 polymorphic loci (13, 17 and 19) were smaller by about 550 bp suggesting that indeed these sites lacked EhSINE1. This was also confirmed by Southern hybridization (Figure 3B, bottom panel). In contrast, the amplicon size of another polymorphic EhSINE1 locus (42) was actually larger by 1.5 kb in Rahman. Probing a Southern blot of the amplicon using EhSINE1-flanking sequences from locus 42 confirmed that the amplified region in Rahman indeed belonged to the same locus (Additional file 3: Figure S2). However, two different sets of primers designed using the HM-1:IMSS sequence at this locus failed to produce an amplicon in Rahman. Therefore it appears that this locus may have undergone multiple changes and is not a simple case of SINE absence. We did not analyse this locus further. The two predicted EhSINE2 polymorphic loci (18 and 50) were also validated using PCR and Southern hybridization (Figure 4 ii and iii). At both loci the amplicons from Rahman were 700 bp shorter (the size of EhSINE2).
These loci were also found to be polymorphic among different strains and isolates of E. histolytica as deduced from analysis of NGS data (compiled in Additional file 4: Table S2). In some strains, although the SINE was present at the locus, the sequence showed some truncations or short deletions. If these changes are not due to assembly errors in the database one could envision various factors that may contribute to this. Most of the truncations were at the 5′-end of the SINE, which could result from the well known phenomenon of incomplete reverse transcription of the SINE RNA template during retrotransposition . Short deletions may appear due to recombination between genomic SINE copies, or due to replication slippage at the short internal repeats in the EhSINEs (described later). However, some of these changes are, indeed, due to sequence assembly errors in the database, which we document below for locus 17 in strains Rahman and MS96-3382.
Sequence analysis of some of the polymorphic loci in strains HM-1:IMSS and Rahman
Sequence data available for the two genomes in AmoebaDB shows that the assembled genome data of Rahman has many more undefined regions and gaps. There are 1529 scaffolds defining the HM-1:IMSS genome (in the size range of 0.9 kb-500 kb) compared to 1145 of Rahman (in the size range of 2 kb-170 kb) and 17378 unassembled contigs. We examined the sequences at loci 13, 17, 19 and 42 more closely and found that the locus 13 sequence was located in a single scaffold in both strains and the sequence was identical except for the loss of EhSINE1 in Rahman. However, the sequences at the other loci were either found in multiple scaffolds/contigs in Rahman, or contained undefined regions, as described below.
Locus 17 was present in scaffold DS571247 (HM-1:IMSS) and EhRmscaffold_00561 (Rahman). Closer examination showed that although most of the EhSINE1 sequence was missing at this locus in Rahman, a stretch of 84 bp still remained at the 5′ end (Additional file 4: Table S2 and Additional file 5: Figure S4). This was followed by a large region of undefined sequence (~750 bp), and if this is an accurate estimate of its size we should obtain amplicons of similar size in both strains. However our data clearly showed that the amplicon in strain Rahman was shorter by 0.5 kb and it did not hybridize with a probe from EhSINE1 sequence (Figure 3). To further verify our results we cloned and sequenced these amplicons from both the strains. Sequence comparison showed that the entire stretch of EhSINE1 was missing in Rahman (Figure 5). EhSINE1 insertion is typically accompanied by target site duplication (TSD) and the Rahman sequence had only one copy of the TSD seen in HM-1:IMSS. The rest of the flanking sequence was identical in the two strains. The 84 bp piece of EhSINE1 shown in the database at this locus was not found in our sequence; rather the entire EhSINE1 was missing. We believe this discrepancy could have arisen due to assembly errors in the database.
Locus 19 was present in the scaffolds DS571126 (HM-1:IMSS) and EhRmscaffold_00536 (Rahman). The sequence upstream of the EhSINE1 location in HM-1:IMSS was undefined in Rahman. However we found three unassembled contigs (EhRmcontig_00303, EhRmcontig_00523 and EhRm_contig21711) in the Rahman database that matched the HM-1:IMSS sequence (Additional file 6: Figure S5). An amplicon from Rahman generated by PCR amplification using a primer each designed from EhRmcontig_00523 and EhRcontig_21011 displayed the expected size (Figure 3B), showing that these contigs likely belong to this locus. Sequence analysis of the amplicon confirmed that the two strains were identical except for the loss of EhSINE1 in Rahman (Figure 5).
Locus 42 in HM-1:IMSS was in one scaffold (DS571158), while in Rahman the syntenic sequence was present across three different scaffolds/contigs (Additional file 7: Figure S3). One contig spanned the downstream gene sequence with which primer 42.1 R was an exact match. However, in primer 42.1 F (Additional file 7: Figure S3) the 3′ nucleotide was a mismatch. Sequence comparison of this region revealed single nucleotide differences at several positions, which may explain our failure to amplify this locus from Rahman using HM-1:IMSS primers.
These results suggest that some of the sequence data currently available in the database needs reanalysis and the predictions need to be validated by experimentation. Our analysis has helped to correctly assemble the sequences at loci 17, 19 and 42 in Rahman.
Genotyping using SINE sequences
We explored the possibility of using some of the polymorphic loci as markers for genotyping. For this we focused on loci 13, 17 and 19 and tested them using 23 axenic and xenic strains of E. histolytica. A genotyping method would need to be used for patient samples, where large amplicons may be difficult to obtain reproducibly due to impurities in DNA preparation and low E. histolytica DNA concentrations. We therefore designed primers as close to the EhSINE1 insertion site as possible to minimize amplicon size (Additional file 1: Figure S1). For each locus two primer sets were used; one set was designed from flanking sequences and the other set comprised one of the flanking primers combined with a primer from the EhSINE1 sequence (Figure 6A and Additional file 2: Table S1). Although care was taken to design primers for each locus that did not match the Entamoeba dispar genome, this was not possible in all cases due to extensive sequence conservation between the two species. However one primer from each pair for all three loci had no match in E. dispar (Additional file 2: Table S1). The amplicons obtained with each of the primer pairs for a given locus were combined and electrophoresed together in the same gel lane (Figure 6B shows the results for axenic strains). The identities of the bands were confirmed by Southern hybridization with a flanking region probe (middle panel, Figure 6B) or an EhSINE1 probe (bottom panel, Figure 6B). DNA from strains HM-1:IMSS and Rahman gave the expected amplicon with each primer pair, except for the 1.4 kb band with primers 13.1 F and 13.2 R expected from HM-1:IMSS, which could not be amplified efficiently. Hence HM-1:IMSS locus 13 was identified by the 0.2 kb 13.1 F/SINE R product. Results with the seven axenic strains showed that EhSINE1 was present at all three loci in strains MS84-1373 and MS27-5030. In this respect they behaved like HM-1:IMSS. However, primer set 17.2 F-17 .2 R could not amplify MS84 and primer set 17.2 R-SINE R could not amplify MS27, indicating that they were not identical to HM-1:IMSS at locus 17. Single nucleotide mutations in the flanking sequences could lead to sequence polymorphisms in these regions and give the observed result due to loss of primer recognition. Since the sequence of this region is not known in these other strains, an explanation for this result would have to await further sequence data. Similarly, strain HK-9 resembled Rahman at all three loci in terms of EhSINE1 occupancy but belonged to a third category since at locus 13 it repeatedly failed to give the expected amplicon size with primer pair 13.1 F-13.2R although the expected amplicon was obtained with primer pair 13.1 F-13.1R (Figure 7A). Strains PVBM08B and PVBM08F were like Rahman at locus 17 and like HM-1:IMSS at loci 13 and 19. Strain MS96-3382 was like Rahman at loci 13 and 17. However, genome sequence analysis (AmoebaDB) showed the presence of a 397 bp SINE sequence (truncated from both ends) at locus 17 in this strain. Since the PCR and Southern data for this locus were unambiguous we are inclined to believe that, as mentioned earlier (Figure 5), the discrepancy between our data and AmoebaDB may be due to sequence assembly problems. Strain 200:NIH was like Rahman at loci 17 and 19. Thus, based on the presence and absence of SINE1, and the amplicons obtained with each primer pair at these three loci, the axenic strains could be divided into five genotypes (Table 3).
The same primer pairs were used for analysis of 16 clinical isolates of E. histolytica (Figure 7, Additional file 8: Table S3). The results are summarized in Table 3. The amplicons were clearly visible only after Southern hybridization for most clinical isolates. The results clearly show mosaic patterns in the three loci, displaying characters of both HM-1:IMSS and Rahman in many strains.
To sum up the above data, a total of 25 E. histolytica strains were used in this study, of which HM-1:IMSS contains EhSINE1 at all three loci (HHH), while Rahman lacks the element at all three loci (RRR). In the remaining 23 E. histolytica strains (including axenic and xenic clinical isolates), EhSINE1 was absent at loci 13, 17 and 19 in 7, 10 and 8 strains respectively. Based on the presence/absence of EhSINE1, and amplicons obtained with the primer pairs at these three loci, the 23 strains were categorized into eleven genotypes (Table 3). Based on SINE occupancy there can only be eight combinations at the three loci (i.e. 23). Additional variations (designated N, which are neither H nor R) have come about due to alterations in flanking sequences leading to loss of primer recognition sites. In the 23 strains tested the most frequent combination was HHH (5 strains) followed by HRR and HRH (3 strains each) and HNH, NRR, HNR and NNH (2 strains each). The use of multiple loci for strain identification is preferred [23, 25] as a single locus cannot differentiate all the strains. The results obtained by our method corroborated with the data from tRNA-STRs. Both methods distinguished the strains HM-1:IMSS, Rahman, 200:NIH and HK-9 from one another [20, 25, 26] and gave the same pattern for strains PVB and PVF (Clark C.G., unpublished observation). Thus our results suggest that in principle genomic distribution of SINEs can be used as a valid method for typing of E. histolytica strains.
Although SINEs are mobile genetic elements, their mobilization in present-day E. histolytica is probably a very infrequent event. This can be inferred from the fact that most genomic copies of the EhLINE1 retrotransposon (which provides the machinery for EhSINE1 mobilization through retrotransposition) are inactive. We have shown experimentally that the retrotransposition activity in these cells is very low or absent . Therefore the genomic location of SINEs in a given strain is stable enough to be used as a strain-specific signature.
Bioinformatic analysis of EhSINE2 copies
Although a detailed bioinformatic analysis of EhSINE1 has been published , a similar analysis of EhSINE2 has not been reported. Therefore we decided to carry out an analysis of EhSINE2 using the approach that has been described for EhSINE1. All sequences that displayed similarity of more than 70% with the consensus sequence and a length of more than 400 bp were extracted from the genome sequence of E. histolytica available at NCBI (total 119). These were analysed for internal repeats (IR) by using Tandem repeat finder . Some of the EhSINE2 sequences also contained IRs, as reported in EhSINE1 (which contains 26–27 bp IRs). EhSINE2 copies could be categorized into distinct classes based on number of IRs (Figure 8). The class with three IRs was the most common, followed by those with two, one and four IRs, respectively (Figure 8). A single copy each of 5 and 13 IR-containing EhSINE2s was also found. About half the EhSINE2 copies either lacked an IR or contained only a fragment of one. We also found one copy each of EhSINE2s that matched the length expected of copies with 1 IR and 3 IR, but in fact contained no IR at all. These observations are similar to EhSINE1 where it was reported that 60% of the copies had either no IR or had the appropriate length for 3 IR but only one out of three IRs was recognizable .We analyzed the IR sequences of all EhSINE2 copies and extracted 150 IR sequences; the majority were 20 bp in length except four, in which the IR was 13–14 bp. A common motif present in these IR sequences was identified by the online motif search tool, MEME to be AATGAATAACAATACACG/CTT/C.
As already mentioned, retrotransposition is accompanied by generation of TSDs. Newly retrotransposed copies are expected to be flanked by identical TSDs, while over time these accumulate mutations, become shorter in length and are finally unrecognizable. Therefore length of TSDs may be a marker of age of SINEs . We analyzed the TSDs of all 119 EhSINE2 copies, and could find TSD in 97 cases. The longest TSDs (ranging in sizes from 16–20 bp) were found in elements with IRs, while copies lacking intact IRs displayed smaller TSDs, in the range of 8–9 bp (Figure 8). This suggests that copies lacking IR may be older and may have suffered loss of IR sequences subsequent to retrotransposition. In the case of EhSINE1, the 2 IR-containing copies were reported to be the most recently transposed elements as they had longer TSDs than the other copies . The TSDs of 81 EhSINE2 sites (excluding those below 8 bp in length) were analyzed by MEME. All 81 TSDs showed the consensus motif T(T/C)T(T/C)TN(A/T)T, suggesting a high percentage of pyrimidines is needed at the insertion point.
SINE elements are useful genomic markers due to their wide occurrence and property of irreversible re-integration in the host genome . The loss of SINEs from genomic loci is a rare event and is generally accompanied by changes in flanking sequences as well . Therefore, as stated earlier, SINEs are better suited to establish genealogies below the species level with minimal assumptions compared with other standard markers, such as microsatellites, RFLPs, and SNPs, which can result from independent mutations at different times that are not inherited from a common ancestor [16, 42–46]. For this reason the analysis of SINE occupancy in E. histolytica strains reported here will be significant to establish intraspecific relationships.
Retrotransposons are known to influence the expression of genes in their vicinity by various mechanisms, including silencing by heterochromatinization, up-regulation by providing alternate promoters, and novel expression patterns through alternative splicing and polyadenylation [47–50]. Thus the gain or loss of EhSINE1 element from a genomic locus could potentially influence the phenotype of the organism in a profound manner. For this reason the strain typing method used here has a potential to reveal loci that may be associated with different phenotypes, including the virulence properties of the parasite. However more samples need to be tested to provide a correlation between virulence and genotype. A combination of rapid genome sequencing and expression analysis from a variety of clinical isolates of E. histolytica by NGS will reveal whether retrotransposons in E. histolytica have the ability to influence neighboring gene expression. This method of strain typing based on retrotransposon occupancy could then have physiological relevance.
World Health Organization: Amoebiasis. Wkly Epidemiol Rec. 1997, 72: 97-100.
Haque R, Huston CD, Hughes M, Houpt E, Jr Petri WA: Current concepts: Amebiasis. N Engl J Med. 2003, 348: 1565-1573. 10.1056/NEJMra022710.
Ali IK, Mondal U, Roy S, Haque R, Jr Petri WA, Clark CG: Evidence for a link between parasite genotype and outcome of infection with Entamoeba histolytica. J Clin Microbiol. 2007, 45: 285-289. 10.1128/JCM.01335-06.
Escueta-de Cadiz A, Kobayashi S, Takeuchi T, Tachibana H, Nozaki T: Identification of an avirulent Entamoeba histolytica strain with unique tRNA-linked short tandem repeat markers. Parasitol Int. 2010, 59: 75-81. 10.1016/j.parint.2009.10.010.
Ankri S, Padilla-Vaca F, Stolarsky T, Koole L, Katz U, Mirelman D: Antisense inhibition of expression of the light subunit (35 kDa) of the Gal/GalNac lectin complex inhibits Entamoeba histolytica virulence. Mol Microbiol. 1999, 33: 327-337. 10.1046/j.1365-2958.1999.01476.x.
Dvorak J, Kobayashi AS, Nozaki T, Takeuchi T, Matsubara C: Induction of permeability changes and death of vertebrate cells is modulated by the virulence of Entamoeba spp. isolates. Parasitol Int. 2003, 52: 169-173. 10.1016/S1383-5769(02)00090-9.
Singer MF: SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell. 1982, 28: 433-434. 10.1016/0092-8674(82)90194-5.
Dewannieux M, Heidmann T: LINEs, SINEs and processed pseudogenes: parasitic strategies for genome modeling. Cytogenet Genome Res. 2005, 110: 35-48. 10.1159/000084936.
Lorenzi H, Thiagarajan M, Haas B, Wortman J, Hall N, Caler E: Genome wide survey, discovery and evolution of repetitive elements in three Entamoeba species. BMC Genomics. 2008, 9: 595-10.1186/1471-2164-9-595.
Bakre AA, Rawal K, Ramaswamy R, Bhattacharya A, Bhattacharya S: The LINEs and SINEs of Entamoeba histolytica: comparative analysis and genomic distribution. Exp Parasitol. 2005, 110: 207-213. 10.1016/j.exppara.2005.02.009.
Kumari V, Sharma R, Yadav VP, Gupta AK, Bhattacharya A, Bhattacharya S: Differential distribution of a SINE element in the Entamoeba histolytica and Entamoeba dispar genomes: Role of the LINE-encoded endonuclease. BMC Genomics. 2011, 12: 267-10.1186/1471-2164-12-267.
Mandal PK, Bagchi A, Bhattacharya A, Bhattacharya S: An Entamoeba histolytica LINE⁄ SINE pair inserts at common target sites cleaved by the restriction enzyme-like LINE encoded endonuclease. Eukaryot Cell. 2004, 3: 170-179. 10.1128/EC.3.1.170-179.2004.
Willhoeft U, Buss H, Tannich E: The abundant polyadenylated transcript 2 DNA sequence of the pathogenic protozoan parasite Entamoeba histolytica represents a nonautonomous non-long-terminal-repeat retrotransposonlike element which is absent in the closely related nonpathogenic species Entamoeba dispar. Infect Immun. 2002, 70: 6798-6804. 10.1128/IAI.70.12.6798-6804.2002.
Shire AM, Ackers JP: SINE elements of Entamoeba dispar. Mol Biochem Parasitol. 2007, 152: 47-52. 10.1016/j.molbiopara.2006.11.010.
Shedlock AM, Okada N: SINE insertions: powerful tools for molecular systematics. Bioessays. 2000, 22: 148-160. 10.1002/(SICI)1521-1878(200002)22:2<148::AID-BIES6>3.0.CO;2-Z.
Batzer MA, Stoneking M, Alegria-Hartman M, Bazan H, Kass DH, Shaikh TH, Novick GE, Ioannou PA, Scheer WD, Herrera RJ: African origin of human-specific polymorphic Alu insertions. Proc Natl Acad Sci USA. 1994, 91: 12288-12292. 10.1073/pnas.91.25.12288.
Shimamura M, Yasue H, Ohshima K, Abe H, Kato H, Kishiro T, Goto M, Munechika I, Okada N: Molecular evidence from retroposons that whales form a clade within even-toed ungulates. Nature. 1997, 388: 666-670. 10.1038/41759.
Clark CG: Methods for the investigation of diversity in Entamoeba histolytica. Arch Med Res. 2006, 37: 258-262. 10.1016/j.arcmed.2005.09.002.
Sargeaunt PG, Williams JE, Grene JD: The differentiation of invasive and non-invasive Entamoeba histolytica by isoenzyme electrophoresis. Trans R Soc Trop Med Hyg. 1978, 72: 519-521. 10.1016/0035-9203(78)90174-8.
Clark CG, Diamond LS: Entamoeba histolytica: a method for isolate identification. Exp Parasitol. 1993, 77: 450-455. 10.1006/expr.1993.1105.
Jr Stanley SL, Becker A, Kunz-Jenkins C, Foster L, Li E: Cloning and expression of a membrane antigen of Entamoeba histolytica possessing multiple tandem repeats. Proc Natl Acad Sci USA. 1990, 87: 4976-4980. 10.1073/pnas.87.13.4976.
Köhler S, Tannich E: A family of transcripts (K2) of Entamoeba histolytica contains polymorphic repetitive regions with highly conserved elements. Mol Biochem Parasitol. 1993, 59: 49-58. 10.1016/0166-6851(93)90006-J.
Haghighi A, Kobayashi S, Takeuchi T, Masuda G, Nozaki T: Remarkable genetic polymorphism among Entamoeba histolytica isolates from a limited geographic area. J Clin Microbiol. 2002, 40: 4081-4090. 10.1128/JCM.40.11.4081-4090.2002.
Haghighi A, Kobayashi S, Takeuchi T, Thammapalerd N, Nozaki T: Geographic diversity among genotypes of Entamoeba histolytica field isolates. J Clin Microbiol. 2003, 41: 3748-3756. 10.1128/JCM.41.8.3748-3756.2003.
Zaki M, Clark CG: Isolation and characterization of polymorphic DNA from Entamoeba histolytica. J Clin Microbiol. 2001, 39: 897-905. 10.1128/JCM.39.3.897-905.2001.
Ali IK, Zaki M, Clark CG: Use of PCR amplification of tRNA gene-linked short tandem repeats for genotyping Entamoeba histolytica. J Clin Microbiol. 2005, 43: 5842-5847. 10.1128/JCM.43.12.5842-5847.2005.
Srivastava S, Bhattacharya S, Paul J: Species- and strain-specific probes derived from repetitive DNA for distinguishing Entamoeba histolytica and Entamoeba dispar. Exp Parasitol. 2005, 110: 303-308. 10.1016/j.exppara.2005.02.020.
Aurrecoechea C, Barreto A, Brestelli J, Brunk BP, Caler EV, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Iodice J, Kissinger JC, Kraemer ET, Li W, Nayak V, Pennington C, Pinney DF, Pitts B, Roos DS, Srinivasamoorthy G, Jr Stoeckert CJ, Treatman C, Wang H: AmoebaDB and MicrosporidiaDB: functional genomic resources for Amoebozoa and Microsporidia species. Nucleic Acids Res. 2011, 39 (Database issue): D612-619.
Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, Heiges M, Innamorato F, Iodice J, Kissinger JC, Kraemer ET, Li W, Miller JA, Nayak V, Pennington C, Pinney DF, Roos DS, Ross C, Srinivasamoorthy G, Jr Stoeckert CJ, Thibodeau R, Treatman C, Wang H: EuPathDB: a portal to eukaryotic pathogen databases. Nucleic Acids Res. 2010, 38 (Database issue): D415-419.
Huntley DM, Pandis I, Butcher SA, Ackers JP: Bioinformatic analysis of Entamoeba histolytica SINE1 elements. BMC Genomics. 2010, 11: 321-10.1186/1471-2164-11-321.
McGinnis S, Madden TL: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004, 32 (Web Server issue): W20-5.
Bakre A: PhD thesis. Comparative and functional analysis of Entamoeba histolytica genome. 2005, New Delhi, India: Jawaharlal Nehru University, School of Environmental Science
Diamond LS, Harlow DR, Cunnick CC: A new medium for the axenic cultivation of Entamoeba histolytica and other Entamoeba. Trans R Soc Trop Med Hyg. 1978, 72: 431-432. 10.1016/0035-9203(78)90144-X.
Clark CG, Diamond LS: Methods for cultivation of luminal parasitic protists of clinical importance. Clin Microbiol Rev. 2002, 15: 329-341. 10.1128/CMR.15.3.329-341.2002.
Robinson GL: The laboratory diagnosis of human parasitic amoebae. Trans R Soc Trop Med Hyg. 1968, 62: 285-94. 10.1016/0035-9203(68)90170-3.
Sambrook J, Russell DW: Molecular cloning: a laboratory manual, 3rd ed. 2001, Cold Spring Harbor: Cold Spring Harbor Laboratory Press
Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 28-36.
Furano AV: The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Prog Nucleic Acid Res Mol Biol. 2000, 64: 255-294.
Yadav VP, Mandal PK, Bhattacharya A, Bhattacharya S: Recombinant SINEs are formed at high frequency during induced retrotransposition in vivo. Nat Commun. 2012, 3: 854-
Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acid Research. 1999, 27: 573-580. 10.1093/nar/27.2.573.
Edwards MC, Gibbs RA: A human dimorphism resulting from loss of an Alu. Genomics. 1992, 14: 590-597. 10.1016/S0888-7543(05)80156-9.
Batzer M, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2001, 3: 370-379.
Batzer MA, Deininger PL: A human-specific (HS) subfamily of Alu sequences. Genomics. 1991, 9: 481-487. 10.1016/0888-7543(91)90414-A.
Sherry ST, Harpending HC, Batzer MA, Stoneking M: Alu evolution in human populations: using the coalescent to estimate effective population size. Genetics. 1997, 147: 1977-1982.
Stoneking M, Fontius JJ, Clifford SL, Soodyall H, Arcot SS, Saha N, Jenkins T, Tahir MA, Deininger PL, Batzer MA: Alu insertion polymorphisms and human evolution: evidence for a larger population size in Africa. Genome Res. 1997, 7: 1061-1071.
Watkins WS, Rogers AR, Ostler CT, Wooding S, Bamshad MJ, Brassington AM, Carroll ML, Nguyen SV, Walker JA, Prasad BV, Reddy PG, Das PK, Batzer MA, Jorde LB: Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Genome Res. 2003, 13: 1607-1618. 10.1101/gr.894603.
Chow JC, Ciaudo C, Fazzari MJ, Mise N, Servant N, Glass JL, Attreed M, Avner P, Wutz A, Barillot E, Greally JM, Voinnet O, Heard E: LINE-1 activity in facultative heterochromatin formation during X chromosome inactivation. Cell. 2010, 141: 956-969. 10.1016/j.cell.2010.04.042.
Estécio MR, Gallegos J, Vallot C, Castoro RJ, Chung W, Maegawa S, Oki Y, Kondo Y, Jelinek J, Shen L, Hartung H, Aplan PD, Czerniak BA, Liang S, Issa JP: Genome architecture marked by retrotransposons modulates predisposition to DNA methylation in cancer. Genome Res. 2010, 20: 1369-82. 10.1101/gr.107318.110.
Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, Schroder K, Cloonan N, Steptoe AL, Lassmann T, Waki K, Hornig N, Arakawa T, Takahashi H, Kawai J, Forrest AR, Suzuki H, Hayashizaki Y, Hume DA, Orlando V, Grimmond SM, Carninci P: The regulated retrotransposon transcriptome of mammalian cells. Nat Genet. 2009, 41: 563-571. 10.1038/ng.368.
Lev-Maor G, Sorek R, Shomron N, Ast G: The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons. Science. 2003, 300: 1288-1291. 10.1126/science.1082588.
This work was supported by a grant to SB from Indian Council of Medical Research and Department of Biotechnology, to JP from Department of Biotechnology and to LRI from Department of Science and Technology, under the Women Scientists Scheme (WOS-A), Government of India. VK received a fellowship from Council of Scientific and Industrial Research, India. Grateful thanks are extended to Dr. Rashidul Haque for providing strain MS96-3382. We acknowledge Gareth Weedall and Neil Hall, Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK for Genome Assembly of E. histolytica strain Rahman; J. Craig Venter Institute and the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, USA for E. histolytica strains KU50, MS96, KU27, KU48, HM-1:CA and DS4, and Lis Caler at the J. Craig Venter Institute for E. histolytica HM-1:IMSS sequence and annotation.
The authors declare that they have no competing interests.
Conceived and designed the experiments: SB and VK. Performed the experiments: VK, VB, SP Analyzed the data: SB AB VK. Computational work done: RR, VK, VB, SP. Contributed reagents/materials/analysis tools: SB, LRI, JP, JJV, CGC. Wrote the paper: SB AB VK. Principal investigator: SB. Reviewed and commented the paper: CGC AB JP LRI JJV. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Figure S1: Description: Schematic representation of flanking genes, EhSINE1/EhSINE2, and position of primers on the E. histolytica HM-1:IMSS scaffolds containing loci 13, 17, 19, 42, 18 and 50. The thin line represents the scaffold, arrowheads denote the different primers, solid boxes represent genes, hollow boxes represent a EhSINE (arrow indicates orientation) and the grey box denotes any repetitive element other than a SINE. Numbers on vertical lines indicate the position of genes and EhSINE on the scaffold. (PDF 26 KB)
Additional file 2: Table S1: Description: Expected amplicon size with each primer pair from genome assemblies. (PDF 54 KB)
Additional file 3: Figure S2: Description: Analysis of locus 42: Locus 42 was amplified from the genomic DNA of E. histolytica HM-1:IMSS and Rahman with the locus-specific primers followed by Southern blotting and hybridization with a locus 42-specific probe (3.7 kb amplicon from the genomic DNA of HM-1:IMSS). (PDF 24 KB)
Additional file 4: Table S2: Description: Detailed analysis of loci 13, 17, 19, 42, 18 and 50 in sequenced strains (AmoebaDB). (PDF 65 KB)
Additional file 5: Figure S4: Description: Schematic representation of locus 17 HM-1:IMSS and Rahman (AmoebaDB): Intact, dotted, broken line, hollow boxes and arrowheads represent similar features to those described in Additional file 7: Figure S3. Scaffold DS571247 contains locus 17 of HM-1:IMSS. The corresponding locus in Rahman is present in EhRm_scaffold00561. The EhSINE1 region, including 300 bp upstream sequence, in HM-1:IMSS is undefined in Rahman (represented by a thin dotted line). A stretch of 84 bp of EhSINE1 from the 5′ end was retained in Rahman (represented by small hollow box). As mentioned in the text and figure 5 assembly of Rahman sequence at the SINE region is erroneous in the database. In fact the entire EhSINE1 sequence is missing in Rahman. (PDF 61 KB)
Additional file 6: Figure S5: Description: Schematic representation of locus 19 HM-1:IMSS and Rahman (AmoebaDB): Intact, dotted and broken lines, hollow boxes and arrowheads represent similar features to those described in Additional file 7: Figure S3. Scaffold DS571226 contains locus 19 of HM-1:IMSS. The corresponding Rahman locus is present in one major scaffold (EhRm_scaffold00536) and three small unassembled contigs (EhRm_contig00303, EhRm_contig00523, EhRm_contig21711), which are represented by red, purple and blue lines and a green box respectively. Ehrm_scaffold00536 has a large undefined region (Ns) where these small contigs are located. (PDF 71 KB)
Additional file 7: Figure S3: Description: Schematic representation of locus 42 in HM-1:IMSS and Rahman (AmoebaDB): Intact lines represent regions that show homology in the two strains (some mismatches have been ignored). The dotted line represents the missing EhSINE1 sequence in Rahman and the hollow box represents EhSINE1 in HM-1:IMSS. The black line represents the Scaffold containing locus 42 of HM-1:IMSS. Red and purple lines and the green box represent EhRm_scaffold00892, EhRm_scaffold00027, EhRm_contig21200, respectively, which contain the corresponding locus in Rahman. Boxes represent the upstream hypothetical protein and downstream mannosyltransferase protein genes. Arrowheads represent the primers and G represent the last nucleotide of the primer (the position of which is indicated in the HM-1:IMSS scaffold) while C represent the mismatched nucleotide at the respective position in Rahman. The blue arrowhead shows the proposed position of the primer in the Rahman scaffold where it may anneal to give the observed amplicon (~5.2 kb) (ACG (blue) represents the last 3 nucleotides of 42.1 F matching this position in the Rahman scaffold). Downstream of EhSINE1 there is a truncated 1.2 kb EhLINE1 sequence which is partly present in two scaffolds of Rahman. Numbers above and below the lines represent the respective positions in the scaffolds/contigs of HM-1:IMSS and Rahman, as well as identifying the position of EhSINE1, genes and the other repetitive region in the loci in the two genomes. Broken lines at the end of scaffold indicate the further extension of scaffolds beyond the region depicted. (PDF 79 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Kumari, V., Iyer, L.R., Roy, R. et al. Genomic distribution of SINEs in Entamoeba histolytica strains: implication for genotyping. BMC Genomics 14, 432 (2013). https://doi.org/10.1186/1471-2164-14-432
- Entamoeba histolytica
- SINE occupancy
- Strain typing