Small interfering RNA-producing loci in the ancient parasitic eukaryote Trypanosoma brucei
© Tschudi et al.; licensee BioMed Central Ltd. 2012
Received: 25 March 2012
Accepted: 24 August 2012
Published: 27 August 2012
Skip to main content
© Tschudi et al.; licensee BioMed Central Ltd. 2012
Received: 25 March 2012
Accepted: 24 August 2012
Published: 27 August 2012
At the core of the RNA interference (RNAi) pathway in Trypanosoma brucei is a single Argonaute protein, TbAGO1, with an established role in controlling retroposon and repeat transcripts. Recent evidence from higher eukaryotes suggests that a variety of genomic sequences with the potential to produce double-stranded RNA are sources for small interfering RNAs (siRNAs).
To test whether such endogenous siRNAs are present in T. brucei and to probe the individual role of the two Dicer-like enzymes, we affinity purified TbAGO1 from wild-type procyclic trypanosomes, as well as from cells deficient in the cytoplasmic (TbDCL1) or nuclear (TbDCL2) Dicer, and subjected the bound RNAs to Illumina high-throughput sequencing. In wild-type cells the majority of reads originated from two classes of retroposons. We also considerably expanded the repertoire of trypanosome siRNAs to encompass a family of 147-bp satellite-like repeats, many of the regions where RNA polymerase II transcription converges, large inverted repeats and two pseudogenes. Production of these newly described siRNAs is strictly dependent on the nuclear DCL2. Notably, our data indicate that putative centromeric regions, excluding the CIR147 repeats, are not a significant source for endogenous siRNAs.
Our data suggest that endogenous RNAi targets may be as evolutionarily old as the mechanism itself.
The “natural” or endogenous RNA interference (RNAi) pathway functions as a genome immune defense mechanism to maintain genome integrity, prevents viral infection and limits the potential deleterious consequences of transposon/retroposon mobilization [1–3]. Whereas deep sequencing of endogenous small interfering RNAs (siRNAs) has confirmed that the large majority of siRNAs in several model organisms, including insects , plants  and mammals [6, 7], are indeed derived from retroposons and transposons, these studies also uncovered new classes of small RNAs originating among others from regions of the genome, where convergent transcription occurs or from loci predicted to generate double-stranded RNA (dsRNA) by a fold back mechanism, i.e. inverted repeats. In addition, tRNA- and snoRNA-derived RNA fragments have recently been added to the catalogue of small RNAs implicated in RNAi-related gene silencing pathways [8–12].
The processing of dsRNA is executed by a member of the Dicer family of RNase III-related enzymes to yield a variety of 21–30 nt small dsRNAs that are then loaded into a specific class of Argonaute (AGO)-family proteins [13, 14]. RNAi was first described in the parasitic protozoan Trypanosoma brucei in 1998  and to date we know that there are two distinct Dicer-like enzymes in these organisms, namely TbDCL1  and TbDCL2 , which are mostly present in the cytoplasm and in the nucleus, respectively. siRNAs generated by both Dicers are transferred to TbAGO1, the sole slicer responsible for cleavage of target transcripts [18, 19] with the assistance of TbRIF4, a 3’-5’ exonuclease .
Soon after the discovery of RNAi in T. brucei, “old-fashioned” sequencing of an 20–30 nt RNA library revealed that the two main classes of retroposons, namely ingi and SLACS, were sources of siRNAs  and subsequent sequencing of siRNAs derived from TbAGO1 immunoprecipitates uncovered a new class of siRNAs from 147-bp tandem units , which we named CIR147 repeats (Chromosome Internal Repeats). These satellite-like repeats are located in non-telomeric regions of T. brucei chromosomes 4, 5, and 8 and were independently identified as part of putative centromeric regions , although at present we cannot exclude the possibility that these repeats also exist on the other chromosomes. In addition, functional studies were consistent with a major role for endogenous RNAi in T. brucei to maintain genome integrity [17, 23]. On the other hand, the unique organization of the trypanosome genome into long directional gene clusters with many sites of convergent transcription  raised the possibility of additional sources of siRNAs and thus may be pointing to new role(s) for RNAi in these organisms. Thus, using deep sequencing we surveyed the small RNAs associated with Argonaute 1 in wild-type cells, as well as in cells devoid of either one of the two Dicers to gauge their respective role in small RNA production.
Summary of total small RNA reads
Reads mapped to the
11 mega chromosomes
Summary of small RNA classes in procyclic T. brucei
SLACS, or Spliced Leader Associated Conserved Sequence, is a site-specific non-LTR retroposon that integrates exclusively into the spliced leader (SL) RNA gene . Our wild-type Ytat 1.1 strain, which was used for the small RNA sequencing described here, has between 16 to 18 copies of the SLACS element per haploid genome (ref.  and data not shown). The element is 6.8 kb long and the two predicted ORFs encode a putative reverse transcriptase and endonuclease domain (Figure 2B). In the wild-type library 26% of the total reads (16% of the unique reads) aligned to this retroposon with approximately twice as many reads originating from the antisense strand. However, in contrast to ingi, siRNAs were restricted to the 5’ half of the element with very few reads in the 3’ half (Figure 2B). Consistent with our analysis of siRNA abundances in TbDCL1 and TbDCL2 KO cells by Northern blots , siRNAs numbers in cells lacking TbDCL1 were slightly lower, but were reduced to 14% in TbDCL2 KO cells (Table 2). The uneven distribution of siRNAs was maintained in both KO libraries.
In the wild-type library 8.6% of the total reads (586,988 reads) aligned to the CIR147 tandem repeats (Table 2). The abundance of CIR147 siRNAs was reduced slightly to 5.7% in the absence of TbDCL1, whereas in the DCL2KO library only 0.2% of the siRNAs originated from these repeats, thus confirming our functional studies that TbDCL2 has a primary role in the generation of siRNAs from CIR147 repeats .
CIR147 repeats on chromosomes 4, 5 and 8 were previously identified as part of putative T. brucei centromeric regions based on etoposide-mediated topoisomerase-II cleavage . Since centromeres in the fission yeast Schizosaccharomyces pombe are under the control of the RNAi pathway , we surveyed other predicted centromeric regions in T. brucei for the production of small RNAs. However, none of the putative centromeric regions on chromosomes 1, 2, 3, 6 and 7  showed a significant accumulation of reads as compared to flanking regions (data not shown). Of note is that putative centromeres were not mapped on the largest three chromosomes, i.e. 9, 10 and 11 . The lack of reads at the putative centromeres on chromosomes 2, 3 and 7 was particularly intriguing, since they contain large arrays of tandem repeats of 29, 120 and 59 base pairs, respectively (ref.  and data not shown), suggesting that short tandem repeats per se are not destined to be a source for small RNAs in procyclic T. brucei. To test this hypothesis, we used the program tandem repeats finder , tabulated all repeat arrays on the 11 mega base chromosomes (data not shown) and then manually inspected repeats with a size >10 bp and a copy number >10 for aligned reads. Although many additional tandem repeats were identified, none was found to be a source of small RNAs (data not shown). Thus, it appeared from this analysis that the CIR147 repeats are the only tandem repeats in the T. brucei genome generating siRNAs and that putative centromeric regions not harboring CIR147 repeats are devoid of siRNAs.
Small RNAs at long inverted repeats in T. brucei
Given the wide distribution of RNAi and related phenomena in all branches of the eukaryotic lineage, a case can be made that the RNAi mechanism originated early during eukaryotic evolution and, as an extension, the repertoire of small RNAs generated by the RNAi pathway should have an evolutionary history. To explore this hypothesis, we have been studying the mechanism and biological function of RNAi in the ancient parasitic protozoan T. brucei. This lead to the realization several years ago that the two retroposons in the genome, namely ingi and SLACS, were a source of siRNAs, thus implicating RNAi in maintaining genome integrity . The next observation we made was that a subclass of putative centromeres containing 147-bp tandem repeats were also generating siRNAs . In the current study we aimed to annotate all small RNA-producing loci in insect-form derived trypanosomes by deep-sequencing RNAs associated with the sole Argonaute 1 slicer. In addition, to further our understanding of the specific role of the two Dicers in this organism, we surveyed small RNAs in cells deficient in either the nuclear or cytoplasmic Dicer.
Our results exposed a number of intriguing features and thus raise numerous questions. Firstly, siRNAs originating from SLACS were restricted to the 5’ half of the element (Figure 2B), suggesting a possible avenue for the generation of double-stranded RNA. Our studies on the expression of the SLACS element suggested that transcription initiates at the +1 position of the interrupted spliced leader RNA gene and continues through the 5’ UTR and ORF1 . In addition, preliminary experiments indicated a detectable amount of antisense transcription, although the low level of transcription made it impossible to pinpoint the extent of transcription, let alone the site of transcription initiation . Nevertheless, having a defined landmark from our read alignments, it might be possible in future experiments to delineate the origin of the double-stranded RNA.
Secondly, the CIR147 repeats present in putative centromeric regions on three chromosomes were the only tandem repeats in the genome giving rise to small RNAs, despite the presence of other short tandem repeats either in putative centromeres or other genomic regions. What is so special about the CIR147 repeats that they are under the control of the RNAi pathway and how are siRNAs, i.e. dsRNA, produced from these arrays? At present there is no knowledge of what constitutes a centromere in trypanosomes and whether heterochromatin is crucial for centromere function. More importantly, a very basic question to put forward is whether the silent transcriptional status of the CIR147 repeats in wild-type cells  is caused by a heterochromatic state of the locus. In Drosophila and in yeast, H2AZ was reported to be involved in heterochromatic silencing [40, 41]. This histone variant has been characterized in T. brucei and shown to function in concert with a novel H2B variant, H2BV . By ChIP, both proteins were shown to be associated with highly repetitive DNA, including the mini-chromosomal 177-bp repeats, the expression site-proximal 50-bp repeats, and telomeric repeats. Intriguingly, H2AZ and H2BV do not co-localize with sites of nascent RNA transcription, suggesting that they are primarily enriched at transcriptionally inactive regions of the genome. Given our data, it is tempting to speculate that these two histone variants might be involved in the regulation of the transcriptional status of the 147-bp repeat clusters, a possibility we are currently investigating. Taking advantage of RNAi-deficient cells, we know that both strands of the CIR147 repeats generate transcripts of over 10 kb and based on α-amanitin sensitivity are synthesized by RNA polymerase II . Although the latter observation might be in line with studies in other organisms, where it has been shown that RNA polymerase II appears to play a direct role in heterochromatin assembly , the assembly of specialized chromatin domains in T. brucei and a possible connection with RNAi remains shrouded in mystery.
Thirdly, we identified 7 large inverted repeats in the T. brucei genome and all generated small RNAs, albeit to different levels. In itself these long inverted repeats are a curiosity, since in many organisms such structures can have a serious effect on genome stability . In particular, the remarkably high sequence identity suggests some selective pressure maybe relying on the formation of a hairpin structure at the DNA or RNA level. It is tempting to speculate that RNAi might play a role in maintaining these repeats.
Fourthly, our data highlighted two annotated pseudogenes that are a source of small RNAs, whereas the majority of pseudogenes, i.e. VSGs and ESAGs, do not generate small RNAs in procyclic trypanosomes. In addition, TbDCL2, and not TbDCL1, appeared to be mainly responsible for these pseudogene-derived small RNAs. Our results seem to be in contrast with a recent study in bloodstream-form trypanosomes, where small RNAs originating mainly from VSG and RHS pseudogenes were found to be dependent on TbDCL1 . However, since we have noted differences in the contribution of the two dicers in the generation of small RNAs from inverted repeats (Table 3), one cannot exclude a similar scenario for pseudogene-derived small RNAs, especially considering that the two studies were done in different life-cycle stages.
Lastly, we detected siRNAs at all convergent transcription units, although the distribution of the small RNAs varied greatly. At present we can only speculate about the functional significance for the existence of siRNAs originating from CTUs. Our analysis of steady-state mRNA levels at two selected CTUs (Figure 6) in wild-type and DCL2KO cells would indicate that RNAi does not play a role in the modulation of mRNA levels at these loci in procyclic cells. However, it is possible that the number of siRNAs originating from CTUs, namely 8,088 and 7,394 for the two we tested, is too low to have a direct effect on gene expression. Alternatively, siRNAs from CTUs could induce DNA or chromatin modifications at the homologous genomic locus. Another open question is the origin of these siRNAs, i.e. how is the dsRNA generated in the absence of evidence that the converging transcription units overlap. At present it is not known how and where transcription terminates at CTUs, although the presence of siRNAs at CTUs might suggest that transcription proceeds to some extent into the adjacent gene cluster. In support of this hypothesis our recent transcriptome studies using RNA-Seq  revealed a low level of anti-sense transcription at CTUs (unpublished observation), providing a possible avenue for the formation of dsRNA. Experiments are ongoing to corroborate this scenario and to investigate the uneven distribution of small RNAs among CTUs.
The results presented here significantly expanded our earlier sequencing and functional studies that retroposon- and CIR147 repeat-derived siRNAs represent the major sources of small RNAs and expanded the repertoire to include small RNAs originating from inverted repeats, pseudogenes and loci of converging transcription units. At the same time, our data set derived from procyclic form trypanosomes did not reveal the presence of miRNAs, as well as tRNA- or snoRNA-derived RNA fragments generated by the RNAi pathway. However, this conclusion does not rule out the possibility that other stages of the trypanosome life cycle might generate a different set of small RNAs.
Our data revealed a dominant role for the nuclear TbDCL2 in the endogenous T. brucei RNAi pathway and the landscape of siRNAs in this ancient eukaryotic parasite closely mirrors that described in metazoans, such as Drosophila and mouse, attesting to the early evolutionary origin of RNAi.
Previously published procedures were followed for culturing trypanosome YTat 1.1 cells , generation of knockout cell lines by PCR-based methods  and Northern blots of total RNA . Oligonucleotides used to prepare probes for Northern blots are listed in Additional file 5.
TAP-tagged TbAGO1 was expressed in wild-type cells , TbDCL1KO and TbDCL2KO cells (this study) and following TbAGO1 affinity purification, the bound small RNAs were prepared for sequencing. For the library construction we essentially followed the protocol suggested by the manufacturer (http://www.illumina.com/).
Libraries were sequenced on an Illumina GAII platform at the Yale Center for Genome Analysis and the reads of 35 nt in length were pre-processed using the FASTX-toolkit on the public Galaxy webserver ([48–50]; http://galaxyproject.org/). Following removal of the adaptor sequences, reads were collapsed to non-redundant datasets and short reads (<= 17 bp) and low complexity reads (including poly A/C/G/or T, di-nucleotide repeats, etc.) were removed. We mapped processed reads, both total and non-redundant reads, to the T. brucei 11 mega chromosomes (GeneDB version 5) using Bowtie  and the Lasergene 9.1 software package from DNASTAR (http://www.dnastar.com/).
This work was supported by Public Health Service grants AI28798 and AI56333 to E.U.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.