Integrative analysis of the Trypanosoma brucei gene expression cascade predicts differential regulation of mRNA processing and unusual control of ribosomal protein expression
© Antwi et al. 2016
Received: 19 January 2016
Accepted: 16 April 2016
Published: 26 April 2016
Trypanosoma brucei is a unicellular parasite which multiplies in mammals (bloodstream form) and Tsetse flies (procyclic form). Trypanosome RNA polymerase II transcription is polycistronic, individual mRNAs being excised by trans splicing and polyadenylation. We previously made detailed measurements of mRNA half-lives in bloodstream and procyclic forms, and developed a mathematical model of gene expression for bloodstream forms. At the whole transcriptome level, many bloodstream-form mRNAs were less abundant than was predicted by the model.
We refined the published mathematical model and extended it to the procyclic form. We used the model, together with known mRNA half-lives, to predict the abundances of individual mRNAs, assuming rapid, unregulated mRNA processing; then we compared the results with measured mRNA abundances. Remarkably, the abundances of most mRNAs in procyclic forms are predicted quite well by the model, being largely explained by variations in mRNA decay rates and length. In bloodstream forms substantially more mRNAs are less abundant than predicted. We list mRNAs that are likely to show particularly slow or inefficient processing, either in both forms or with developmental regulation. We also measured ribosome occupancies of all mRNAs in trypanosomes grown in the same conditions as were used to measure mRNA turnover. In procyclic forms there was a weak positive correlation between ribosome density and mRNA half-life, suggesting cross-talk between translation and mRNA decay; ribosome density was related to the proportion of the mRNA on polysomes, indicating control of translation initiation. Ribosomal protein mRNAs in procyclics appeared to be exceptionally rapidly processed but poorly translated.
Levels of mRNAs in procyclic form trypanosomes are determined mainly by length and mRNA decay, with some control of precursor processing. In bloodstream forms variations in nuclear events play a larger role in transcriptome regulation, suggesting aquisition of new control mechanisms during adaptation to mammalian parasitism.
Organisms of the family Kinetoplastidea are unicellar flagellated eukaryotes which can be free-living or parasitic. One of the most remarkable features of Kinetoplastid molecular biology is the arrangement of protein-coding genes in polycistronic transcription units [1–4], which precludes control of transcription at the level of individual mRNAs. All evidence obtained to date indicates that trancription initiation is determined by chromatin modification . During exponential growth, differences in initiation between the different polymerase II transcription units have not been observed. The only documented polymerase II control is global down-regulation of initiation upon heat shock or the attainment of stationary phase; this impacts open reading frames at the beginning of transcription units earlier than those at the ends . The lack of transcriptional control at the level of individual genes means that differences in steady-state gene expression must be effected post-transcriptionally. The most extensive analyses of gene expression have been for the African trypanosome Trypanosoma brucei, which is the subject of this paper. Nearly all studies of in vitro cultured T. brucei have concentrated on the bloodstream form (similar to the form that multiplies in mammalian blood) and the procyclic form (similar to the form that multiplies in the midgut of Tsetse flies). In T. brucei, the transcription units that include the genes encoding major surface proteins of the bloodstream and procyclic forms - variant surface glycoproteins and the procyclins - are important exceptions to the general rule: they are transcribed in a developmentally-regulated fashion by RNA polymerase I .
Mature mRNAs are exported to the cytosol, where they are translated and/or degraded (Fig. 1). There is much more quantitative information available about these cytosolic events than there is about events in the nucleus. Numerous studies have documented differences in the decay rates of individual mRNAs both at steady state, and between different life-cycle stages . The “Silicotryp” project aimed to quantify many aspects of trypanosome gene expression and metabolism, and to develop mathematical models that reflected the biology of the organism . In the context of that project, using high-throughput cDNA sequencing (RNA-Seq), we completed transcriptome-wide measurements of mRNA decay rates in cultured bloodstream and procyclic forms . Half-lives varied from less than 5 min to more than 4 h (beyond the capacity of the assay), with medians of 12 min for bloodstream forms and 20 min for procyclic forms. Levels of individual mRNAs varied from undetectable to over 100 mRNAs/cell/gene, with medians of 1.1 (procyclic forms) and 1.5 (bloodstream forms) . We and others also measured ribosome occupancy of mRNA by ribosome profiling, finding a range from tightest possible packing to less than one ribosome per mRNA [20, 21]. The results thus indicate pervasive regulation of both mRNA decay and translation.
Mathematical modelling of gene expression can help us to understand how the kinetics of different steps influence mRNA levels [22–24]. Our first model of T. brucei gene expression was made for the mRNAs encoding phosphoglycerate kinase isoforms . The model consists of a series of differential equations that describe transcription, mRNA trans splicing, and degradation of both the mRNA precursor and the mature mRNA (Fig. 1). For phosphoglycerate kinase, all parameters except the transcription rate were measured; the model then enabled us to estimate the transcription rate for this polycistronic unit . The availability of accurate transcriptome-wide mRNA abundance and decay measurements  meant that we could test whether the model that had been developed for phosphoglycerate kinase could be applied transcriptome-wide. To do this, we simulated ‘virtual’ transcripts, rather than doing one-to-one comparisons with real individual mRNAs . The cell division time is known, and we assumed that all genes are transcribed at the same rate, as estimated from the phosphoglycerate kinase example. The major unknown parameter was mRNA processing: for the sake of modelling, we therefore allowed this rate to vary within narrowly-defined limits, determined by a very small number of published individual measurements. Our initial analyses revealed that our model gave a distribution that was very similar to that measured for short mRNAs, but longer mRNAs had lower steady-state abundances than was predicted based on their half-lives. We therefore included length-dependent co-transcriptional degradation of precursor RNAs in the model  (Fig. 1). This is consistent with the idea that the longer the mRNA spends as a precursor, the more likely it is to be subjected to nuclear precursor degradation. The length correction considerably improved the fit of the model to the data.
A major use of mathematical models is to identify the existence of unknown factors or parameters. The extent to which a model fits the data depends on (a) the accuracy of the model and (b) the accuracy of the data that has been fed into the model. In our case, one of the most obvious data deficiencies was the absence, for almost all genes, of information concerning splicing and polyadenylation kinetics. We therefore set out to use the model to identify mRNAs which showed unusually slow, or stage-regulated, splicing. To do this, we first refined the model for bloodstream trypanosomes, and adjusted the parameters to make the model suitable for procyclic trypanosomes. Then, for each individual mRNA, we predicted the abundance using the known length and decay rate. For three quarters of procyclic form mRNAs, the predicted and measured abundances differed by less than a factor of 2. The comparison thus enabled us to identify the mRNAs that are less abundant than expected, and are therefore likely to be poorly processed. In addition, we measured the ribosome occupancy of mRNAs under exactly the same conditions as were previously used for the mRNA decay measurements. The combined results enable us to assess the roles of mRNA processing, decay and translation in the developmental regulation of gene expression.
Parameters used in the different models
No genes examined
μ (min−1) Division time
0.0019 (6 h)
0.0019 (6 h)
0.0019 (6 h)
0.0019 (6 h)
0.0019 (6 h)
0.0019 (6 h)
0.0011 (10.5 h)
0.0011 (10.5 h)
v (molecules.cell−1 .min−1)
Length adjustment α
z = 600
z = 600
z = 600
z = 660
z = 600
Correlation coefficient (r)
The transcription rate and mRNA length factor z were both optimised using the Nelder-Mead optimization algorithm, in order to most closely mirror the abundance distribution. As an alternative way to measure the influence of mRNA length on half-life, we re-calculated it as follows: (a) we modelled the expected amount of mRNA without applying any length correction; (b) we divided the measured mRNA abundance by the modelled abundance from (a); (c) we fitted an optimal curve. All of this optimisation was done to fit the overall distribution, rather than for individual genes. Results from the various models are in Additional file 10: Figure S1, Additional file 11: Figure S2 and Additional file 12: Figure S3.
Motif searches were done using DREME , MEME  and to analyse pyrimidine patterns, a custom script (Additional file 13). 5′-region comparisons compared 62 mRNAs that had abundances of less than 0.25-fold the predicted value in both forms with 723 mRNAs with abundances of 1.5-fold - 2-fold predicted values. Polypyrimidine tract analysis focused on the 70 nt upstream of the principal splice acceptor, and to reduce the likelihood of considering mis-annotated sites, we considered only mRNAs with annotated 5′-UTRs between 10 and 300 nt. To look for 3′-UTR motifs that affect degradation rates, we tried multiple different categories: for example, mRNAs that are very unstable and stabilised by depletion of XRNA in bloodstream but not procyclic forms, and vice versa; mRNAs that are stable in all stages; and various other groupings.
Trypanosome culture and ribosome profiling analysis
Trypanosomes were cultured exactly as described in . For ribosome profiling, RNA was made using Trizol and trypanosome pellets were snap frozen; both were shipped on dry ice. Ribosome profiling was done exactly as described in . To determine mRNA levels, the mRNA was poly(A) selected, fragmented to pieces of 30–70 nt, then made into cDNA as described previously. It was not possible to conduct a ribosomal RNA depletion as described previously, because the “ribo-minus” kit that had been used is no longer available and the new version is unsuitable for trypanosome RNA. Comparative analyses concentrated on a set of genes that represented each unique open reading frame [9, 16].
To analyse polysomal RNA, trypanosome extracts were subjected to sucrose density gradient centrifugation . RNA was prepared from the fractions that were denser than monosomes; the monosomes; and lighter fractions. A portion was used for Northern blotting with the spliced leader as probe, to quantify the proportion of mRNA in the polysomes. Ribosomal RNA was depleted from another aliquot by incubation with oligonucleotides complementary to trypanosome rRNA together with with RNase H, and the RNA was sequenced as described previously. Further details of this polysomal RNA analyses will be published elsewhere.
Alignments were done using custom scripts, as previously described [28, 29]. Statistical analyses were done in R or RStudio. To find differences between datasets we used DeSeq . Plots were made using R and Microsoft Excel, then re-formatted in Adobe Illustrator.
Adjustments to the gene expression model and adaptation to procyclic forms
In our previous publication, our model for bloodstream-form gene expression was used to simulate transcriptome-wide mRNA abundances, using half-life and mRNA length distributions that matched the measured values . We did not, however, attempt to analyse predictions for individual mRNAs. Before doing that, we decided to examine various details of the model in order to make sure that it reflected the data to the best extent possible. Since we have data for mRNA half-lives, the parameters that needed to be optimised were the transcription rate, splicing kinetics, polyadenylation kinetics and the relationship between mRNA length and precursor decay. To optimise the model we compared its output with data for mRNAs for which we have reliable measurements of gene copy number, mRNA abundance, mRNA length, and mRNA half-life. After each model adjustment, we first compared the predicted population distribution with the real population distribution. Results are shown in the left-hand panels of Additional file 10: Figure S1, Additional file 11: Figure S2 and Additional file 12: Figure S3. Next, for each individual mRNA, we compared the simulated mRNA abundance with the measured abundance: these results are in the right-hand panels of Additional file 10: Figure S1, Additional file 11: Figure S2 and Additional file 12: Figure S3. Table 1 shows the changes made in the different model versions, with Pearson correlation coefficients at the level of individual mRNAs.
In the next step, we looked at processing in bloodstream forms. Reliable splicing kinetic measurements are available for just 3 genes: a half time of 1.7 min for the PGK precursor in bloodstream forms , and similar values for HSP70 in bloodstream forms  and for alpha and beta tubulin in procyclic forms . We first used the published bloodstream-form model  with no modifications (Table 1, column BS-A). We inputted the half-life and mRNA length distributions from 851 genes (for the choice see below), and set splicing and polyadenylation half-times to 1.7 min. The Pearson correlation coefficient between predicted mRNA abundance and real abundance, at the level of individual mRNAs, was 0.59 (Table 1).
In our previous RNASeq analyses, we had attempted to measure bloodstream-form splicing half-times transcriptome-wide, using as a surrogate the rate at which the precursor disappeared . We did not know whether these measurements had any validity because the numbers of RNASeq reads from precursors were low, and many mRNAs have several splice acceptor sites. Nevertheless, for model optimisation, we focused on 851 genes for which we had estimated splicing half-times of between 1 and 5 min. We hoped that in this way, we would be excluding mRNAs with very slow processing, since this might interfere with model optimisation. Instead of using a constant splicing time, we tried inserting the values for processing kinetics that had been measured by RNASeq (Table 1, columns BS-B and BS-C; Additional file 10: Figure S1). The correlation was not improved relative to BS-A, which confirmed our previous suspicions that our attempted transcriptome-wide measurements of splicing rates for individual genes were too inaccurate to use in modelling. We therefore decided, for subsequent models, to use a fixed splicing time of 1.7 min. We started from model BS-A and optimised the transcription rate (v) and the length correction (z-factor) simultaneously using the Nelder-Mead optimization function (Table 1, column BS-D; Additional file 10: Figure S1). This yielded a similar fit to model BS-A, both for the overall distribution and for individual mRNAs.
The last parameter to be considered was the relation between length and abundance. We needed to assess whether this relationship was more complex than the simple hyperbolic relation used so far. To obtain an empirical answer, we plotted abundance against half-life for the 851 genes and fitted the length correction directly. The power relation that we obtained (see formula in Fig. 1 and notes to Table 1) fitted the data globally better than the linear relation used before, but slightly decreased the correlation between prediction and measurement at the level of individual mRNAs (Table 1, columns BS-E and F; Additional file 11: Figure S2). This adjustment also changed the calculated transcription rate.
Next, we made two different models for procyclics. One, PC-A, was based on BS-A , and for the other (PC-B) we fitted a length-abundance power relation as for BS-E and F. We used a longer cell division time, as procyclic forms grow slower than bloodstream forms. We calculated the steady-state mRNA levels for 3776 genes with both models. Comparisons between simulated and real abundances for both models are in Additional file 12: Figure S3. Both models fitted the procyclic data considerably better than any of the bloodstream-form models did for the bloodstream-form data (Table 1, Columns PC-A and PC-B).
Comparing simulated and real abundances of individual mRNAs reveals which transcripts are regulated by processes other than mRNA decay
Model BS-D was used to predict the abundances of the 4088 individual mRNAs with reliable measurements in bloodstream forms. As we previously noted , many mRNAs were less abundant than expected (Fig. 2b). Just 3 mRNAs were more than 4 times more abundant than was predicted by the model, and this time the ribosomal protein mRNAs did not show unusual behaviour. 452 mRNAs were more than 4 times less abundant than predicted for bloodstream forms, as opposed to 127 in procyclic forms. A lower than expected abundance indicates that one or more of the parameters that was used in the modelling for that gene is incorrect. The most likely explanation is that the processing half-time is longer than 1.7 min, since processing can be modulated at the level of individual genes. However, slower transcription or an enhanced susceptibility to nuclear degradation also must be considered.
We compared the RNAs that were less abundant than expected with the remainder of the mRNAs. No functional category was significantly enriched, and genes at the end of transcription units (defined as last three genes before a transcription stop region) were not over-represented. If processing were slower than normal, either splicing or polyadenylation could be affected. Polyadenylation sites are too heterogeneous for automated analysis, but we could analyse the sequences upstream of the open reading frames and dominant splice acceptor sites. There were no significant differences between the groups in the lengths of the polypyrimidine tracts upstream of the splice sites, and no enriched 5′-UTR motifs. In contrast, mRNAs that showed unexpectedly low abundance either in both forms, or in procyclics only, had significantly longer annotated 5′-UTRs than those with “normal” abundance. The transcriptome-wide median 5′-UTR length was estimated at 130 nt . We found a median length of 101 nt for mRNAs that were not under-represented in either stage; 181 nt for mRNAs that were of low abundance in both forms (Student t-test probability that the distribution is identical to the first set was p = 0.001) ; and 144 nt (p = .02) for those with low abundance in procyclics only. There was no significant enrichment of upstream open reading frames in any of the categories. Notably, the median 5′-UTR length for ribosomal protein mRNAs is 22 nt.
Modelling does not yield evidence for regulation of polymerase II transcription
There are three sets of experimental evidence that RNA polymerase II transcribes Kinetoplastid genomes at a constant rate, independent of the chromosome, transcription unit or individual gene. The most convincing results are from nuclear run-on experiments done with Leishmania, but quantitation of such experiments is difficult because the signals are rather low [3, 4]. Second, in T. brucei procyclic forms, integration of a reporter plasmid into four different positions in polymerase II transcription units, upstream of genes encoding tubulin, aldolase, actin, and HSP70, yielded similar levels of expression . Thirdly, bloodstream-form mRNA abundance:half-life ratios across chromosome 10 do not show obvious differences between transcription units .
Optimised transcription rates for selected polycistronic units. Rates of transcription were optimised using mRNA half-life and abundance data to get the best fit, using model PC-B. For each pair of transcription units, divergent transcription initiates within a common region
Polycistronic unit pair
Optimised Transcription rate
Pearson correlations (R)
Number of genes
Next, we examined paired divergent transcription units that share a common initiation region . If transcription really varies, and the initiation rate is determined only by histone modifications around the start sites, then units that share a promoter region should show similar predicted transcription rates. Contrary to this expectation, the transcription rates that were fitted for units sharing an initiation region did not correlate. There are two possible interpretations of this result. One possibility is that in divergent transcription units, the rates of transcription in the two directions are not equal. The alternative is that by chance, different transcription units have different average processing rates. Only direct measurements of transcription can resolve this issue.
Ribosome loading in procyclic forms is affected by the growth conditions
The original aim of our project was to allow transcriptome-wide modelling of gene expression from RNA synthesis to the final steady-state protein level. For this, we needed to measure translation and protein degradation. Two previous analyses of translation by ribosome profiling have revealed very wide differences in mRNA translation efficiencies [20, 21]. To compare models with quantitative data, however, it is essential that the cellular growth conditions for all measurements should be identical. To extend our analysis to translation, we therefore cultured trypanosomes exactly as we had for the measurements of mRNA decay  and subjected samples to ribosome profiling . We analysed three procyclic-form and two bloodstream-form samples (Additional file 18: Table S4). Correlations between similar samples were always greater than 0.96 (not shown).
The total numbers of ribosomes per cell that mapped to each ORF is a measure that combines mRNA level and ribosome density. Interpretation of the results from procyclic forms is complicated because there were differences in several parameters: trypanosome strain, medium composition, and cell density. The Jensen results are from strain TREU927, grown to densities of 5–10 × 106/ml in SDM79 medium, whereas the Silicotryp  results are for strain Lister 427, grown to densities of 1–2 × 106/ml in MEM Pros. Metabolites from 10 % serum are present in both media. MEM Pros has 5 mM proline as an energy source, with about 500 μM glucose from the foetal calf serum, whereas SDM79 also contains 10 mM glucose. MEM Pros contains only adenosine as a purine source, whereas SDM79 also has guanosine and small amounts of pyrimidines.
Procyclic forms grown with abundant glucose are known to convert it mainly to succinate and acetate, with limited dependence on respiratory complexes II and IV and the F1 ATPase . Proline catabolism results mainly in secretion of alanine and glutamate, but succinate is also subject to complex-II-dependent channelling towards pyruvate and gluconeogenesis . The differences that we saw in the two datasets ran contrary to our expectations. We expected the proline-grown cells to express more proline dehydrogenase , since the mRNA level is higher  but no significant differences were observed in the ribosome-bound mRNAs for this or any other enzyme of proline catabolism. Strangely, the proline-dependent MEM-Pros-grown cells showed lower translation of several components involved in mitochondrial electron transport, but greater translation of three proteins involved in glucose metabolism - hexokinase and two subunits of the mitochondrial alternative oxidase.
The increased production of several other proteins in the MEM-Pros grown parasites, such as those involved in kDNA replication, the kinetochore, and DNA repair could be linked to lower cell density, rather than medium composition. This population also had increased translation of adenosine and guanosine kinases, and of an uncharacterised procyclic-form-specific purine transporter. Differences in expression of some RNA-binding proteins might have been responsible for some of the differences in mRNA abundance or translation. For example, excess translation of the stress response regulator ZC3H11 was seen in the MEM-Pros grown parasites, compatible with the observed higher production of HSP83 and mitochondrial HSP60 .
Ribosome density does not correlate with mRNA half-life
Ribosome densities were calculated for bloodstream forms (Additional file 19: Table S5, sheet 2 column O) and procyclic forms (Additional file 19: Table S5, sheet 1 column K). For bloodstream forms, only 22 mRNAs had impossible ribosome densities (less than 30 nt/ribosome); for procyclics there were 69 (Fig. 4c). Such results could be caused by an under-estimation of the number of mRNAs; read counts can also be inaccurate if a gene is internally repetitive or part of a multi-gene family. The “impossible” densities could also be caused by the fact that we assumed that all ribosomes are engaged in translation, whereas in fact, a small portion is present as subunits and we do not know how many monosomes are bound to mRNA (see Legend to Additional file 19: Table S5). The ribosome densities given in this paper are therefore almost certainly over-estimated. The median ribosome density that we calculated was 3.2/kb for procyclics and 2.7/kb for bloodstream forms. Most mRNAs showed similar translation in both stages; 1615 mRNAs were at least 2-fold better translated in procyclic forms, and 629 in bloodstream forms (Fig. 4c). As previously noted [20, 21], there was almost no correlation between mRNA level and ribosome density (not shown).
When mRNAs were split into broad functional categories, there was no significant developmental regulation of ribosome density for any category. We also found few significant differences in ribosome densities between the categories (Additional file 20: Figure S6). In bloodstream forms, mRNAs encoding proteins of glucose and glycerol metabolism had significantly higher ribosome densities than average, with a median of about 130 nt/ribosome. The subset involved in glycolysis were closely packed, with only 50 nt/ribosome, and was also well-translated in procyclics (73 nt/ribosome). Results for alpha and beta tubulin and actin were similar. Citric acid cycle and pentose phosphate pathway mRNAs were also well translated in both forms. Since the citric acid cycle enzymes show no activity in bloodstream forms, it is likely that some of these mitochondrially-targeted proteins are degraded after translation, as was observed for cytochrome c . As previously observed , most mRNAs encoding ribosomal proteins are not particularly well translated (2.8 ribosomes/kb in procyclics, 1.5 in bloodstream forms) although they are abundant and very stable.
The availability of ribosome density and mRNA half-life measurements for cells grown under identical conditions meant that we could now ask questions about how the two processes interact. We therefore examined the hypothesis that packing of ribosomes on the mRNA is correlated with the mRNA half-life. The correlation between ribosome density and half-life in procyclic forms was extremely weak (Fig. 4d, Additional file 19: Table S5, sheet 3), and there was none for bloodstream forms (Additional file 19: Table S5, sheet 4). We therefore decided to look at the mRNAs with half-lives that had been too long to measure using transcription inhibition and RNASeq. Interestingly, the average density on these coding regions was 11–13 ribosomes/kb, whereas the average densities on the mRNAs with measureable half-lives were 5.2 ribosomes/kb for procyclics and 3.7 ribosomes/kb in bloodstream forms. This would be compatible with the idea that dense ribosome packing stabilises an mRNA, but could also simply indicate that these mRNAs have been selected for both high stability and very efficient translation.
Some effects on ribosome loading are passive. A very unstable mRNA cannot have a high ribosome density because it takes time for translation to initiate after an mRNA exits the nucleus . If the mRNA is decapped first, no initiation will occur. In yeast, the interval between initiation events varies from 4–233 s, with a median of 40 s ; on average, an mRNA with a half-time of 5 min and a 40s initiation interval cannot load more than 7 ribosomes before it is decapped. Those mRNAs with long open reading frames and short half-lives would thus be expected to show particularly low ribosome densities. There was, however, no correlation between coding region length and ribosome density, either for the entire dataset or for mRNAs with half-lives of 5–6 min. This suggests that sequence-specific regulatory mechanisms override passive kinetic effects.
In procyclic forms, low average ribosome densities are partially due to lack of translation initiation
The long-term aim of our project is to explain quantitatively the process of trypanosome gene expression from DNA to protein at steady state. Trypanosomes are a good model for this because of their polycistronic transcription - which means that we can, at least initially, assume that transcription is constant for all genes at steady state. The number of steps is also limited. As with all mathematical models of biological processes, some steps can be fed with detailed measurements, whereas for others, parameters have to be modelled to fit the existing data as well as possible. In the case of the phosphoglycerate kinase locus, nearly all parameters could be measured: cell division time, splicing kinetics, mRNA and protein levels and half-lives, and ribosome loading. This allowed the construction of a kinetic model that could be used to estimate the transcription rate and the polypeptide elongation rate . This can readily be extended to other individual genes for which measurements are available . Our ability to extend the model to the entire proteome has so far been constrained by technical limitations: we lack proteome-wide measurements of protein abundance and turnover. Consequently, our genome-wide modeling can currently cover only the steps from the genome to steady-state mRNA levels. Since the best-measured kinetic parameters are the cell division time and the mRNA half-lives, we are able to adjust the model to simulate mRNA abundances, and then use it to predict variations in the other parameters.
In this work, we showed that most of the variation in mRNA abundances in procyclic forms can be predicted based on differences in mRNA half-life and in length-dependent precursor degradation. Only 6 % of mRNAs had abundances in procyclic forms that differed more than 4-fold from the predicted value, and most of those were less abundant than expected. This must be because the mRNA spends longer time as a precursor than is assumed in the model, resulting in excess precursor degradation. This could, in turn, happen because transcription elongation, trans splicing, or polyadenylation are slower than the rates that we fed into the model. Slower processing seems the more likely explanation. The processing half-time that we used in the model was derived from rates that were measured for three very abundant mRNAs. There is plenty of evidence that splicing efficiencies can be influenced by the sequences around the splice acceptor site [42–44]. We found that mRNAs with unexpectedly low abundance in procyclics had somewhat longer annotated 5′-UTRs than normal. The presence of an open reading frame downstream of the splice site is important in splice site choice [43, 44], so it may be that splice acceptor sites that are further from the start codon are used less efficiently.
Deviations from the model predictions were much more marked for bloodstream-form trypanosomes than for procyclics, implying that developmental regulation of processing could be biased towards reduced abundance of mRNAs that are not required in bloodstream forms. Such notions need to be tested by more accurate measurements of processing kinetics. These measurements are technically demanding, but our model predicts which mRNAs are good candidates for such experiments. For example, there are six mitochondrial protein mRNAs that have similar half-lives in bloodstream and procyclic forms, but are much less abundant in bloodstream forms (Additional file 14: Table S2, sheet 2). There have been several recent reports of trypanosome factors that can influence trans splicing [45–48] and it will be interesting to find out whether these target such mRNAs.
Our ribosome profiling experiments confirmed the major role of translation control in trypanosomes, with huge variations between mRNAs in ribosome densities. Unfortunately these results alone cannot be used to predict the translation rate , since a high ribosome density could indicate not only high translation, but also pausing or slow elongation. However, in E. coli and yeast, ribosome profiling data correlates well with protein abundance both within and across coding regions, suggesting that in many cases ribosome profiling approximates absolute protein production . In procyclics, there was a weak correlation between the ribosome occupancy of mRNAs and the proportion of the mRNA that co-migrated with polysomes. This suggests that at least part of the variation in ribosome density is caused by regulated translation initiation: some mRNAs are being translated, whereas others are not. Some of the variation could be between cells at different cell-cycle stages. To understand this further it would be interesting to subject synchronised cell populations, and specific portions of polysome gradients, to ribosome profiling.
We observed a moderate correlation between mRNA half-life and ribosome density in procyclic trypanosomes. There is extensive evidence for cross-talk between translation and the mRNA decay machinery in yeast; although polysomal mRNAs are substrates for the decay pathway [50, 51], there are several circumstances under which mRNAs with poor translation show accelerated decay . However a correlation between ribosome density and mRNA half-life does not necessarily mean that high ribosome densities cause an mRNA to be stable. It may just be that mRNAs that encode abundant proteins are selected for both high abundance and efficient translation.
The regulation of ribosomal protein synthesis seems to be exceptional in at least two ways, especially in procyclic forms. Most of these mRNAs are too stable to allow a half-life measurement in our RNASeq analysis. Those with measurable half-lives, however, were more abundant than expected in procyclic forms, suggesting either very rapid processing or unusual resistance to co-transcriptional degradation. The second peculiarity, which has been noted before , is that the ribosomal mRNAs have strikingly low ribosome densities, although high proportions of these mRNAs are in polysomes. Why could growing cells need extremely efficient production of a stable mRNA, then not use it? Most ribosomal protein mRNA levels are highest in early G1 ; perhaps their translation is also cell-cycle regulated. Alternatively, or in addition, the surplus mRNA may enable rapid synthesis of extra ribosomes when trypanosomes move to a new environment. Interestingly, the “exceptionality” of ribosomal protein mRNAs is not confined to trypanosomes. Ribosomal protein mRNAs also have low ribosome densities in yeast , just as in trypanosomes, but they also show an interesting peculiarity in mRNA synthesis and decay. Miller et al. studied measured transcription and mRNA decay rates in yeast, by in vivo 4-thiouracil labelling followed by RNASeq . They found that in general, mRNA synthesis and decay rates were not correlated for individual mRNAs. In contrast, ribosomal protein genes showed exceptionally high transcription rates, with a strong correlation between their rates of mRNA synthesis and degradation.
The comparison between our modelling results and the data indicate that whereas procyclic-form trypanosomes rely mainly on mRNA decay to determine mRNA levels, many mRNAs in bloodstream-form trypanosomes are subject to an additional layer of regulation. Similarly, whereas the results for procyclic forms suggest that translation is often controlled at the level of initiation, bloodstream-form translation control is more complex. It is generally thought that mammalian-infective trypanosomes evolved from an ancestor that infected only arthropods . Our results suggest that adaptation to the mammalian environment may have involved the acquisition of novel gene expression control mechanisms that operate in the nucleus.
This study did not include the use of any animals, human or otherwise, so did not require ethical approval.
Levels of mRNAs in procyclic form trypanosomes are determined mainly by length and mRNA decay, with some control of precursor processing. In bloodstream forms variations in nuclear events play a larger role in transcriptome regulation, suggesting aquisition of new control mechanisms during adaptation to mammalian parasitism.
Availability of supporting data
The polysome gradient data are available under accession numbers E-MTAB-4558 (bloodstream forms) and E-MTAB-4555 (procyclic forms). The ribosome profiling results are at NCBI with bioproject ID PRJNA315042.
We thank David Ibbersson (Cell Networks and Bioquant, Heidelberg) for preparing the libraries for sequencing at EMBL. We acknowledge a contribution of Vikram Kumar to the bloodstream form polysome data analysis.
This work was supported by the collaborative Sysmo grant “The silicon trypanosome”, with funds from the Bundesministerium für Bildung und Forschung, BBSRC (BB/I004602/1), and the Netherlands Organization for Scientific Research (NWO). Work in Edinburgh was also funded by the Wellcome Trust (088293; 095831). Work in Seattle was supported in part by a grant from the US National Institutes of Health, 1R21 AI094129. DD and IM were supported by the Deutsche Forschungsgemeinshaft, Grant no CL112/17-1. The authors are solely responsible for the content.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Palenchar JB, Bellofatto V. Gene transcription in trypanosomes. Mol Biochem Parasitol. 2006;146:135–41.View ArticlePubMedGoogle Scholar
- Daniels J, Gull K, Wickstead B. Cell biology of the trypanosome genome. Microbiol Mol Biol Rev. 2010;74:552–69.View ArticlePubMedPubMed CentralGoogle Scholar
- Martinez-Calvillo S, Yan S, Nguyen D, Fox M, Stuart K, Myler PJ. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol Cell. 2003;11:1291–9.View ArticlePubMedGoogle Scholar
- Martinez-Calvillo S, Nguyen D, Stuart K, Myler PJ. Transcription initiation and termination on Leishmania major chromosome 3. Eukaryot Cell. 2004;3:506–17.View ArticlePubMedPubMed CentralGoogle Scholar
- Siegel T, Hekstra D, Kemp L, Figueiredo L, Lowell J, Fenyo D, Wang X, Dewell S, Cross G. Four histone variants mark the boundaries of polycistronic transcription units in Trypanosoma brucei. Genes Dev. 2009;23:1063–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Kelly S, Kramer S, Schwede A, Maini P, Gull K, Carrington M. Genome organization is a major component of gene expression control in response to stress and during the cell division cycle in trypanosomes. Open Biol. 2012;2:120033.View ArticlePubMedPubMed CentralGoogle Scholar
- Das A, Banday M, Bellofatto V. RNA polymerase transcription machinery in trypanosomes. Eukaryot Cell. 2008;7:429–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Michaeli S. Trans-splicing in trypanosomes: machinery and its impact on the parasite transcriptome. Future Microbiol. 2011;6(4):459–74.View ArticlePubMedGoogle Scholar
- Siegel T, Hekstra D, Wang X, Dewell S, Cross G. Genome-wide analysis of mRNA abundance in two life-cycle stages of Trypanosoma brucei and identification of splicing and polyadenylation sites. Nucleic Acids Res. 2010;38:4946–57.View ArticlePubMedPubMed CentralGoogle Scholar
- Kolev N, Franklin J, Carmi S, Shi H, Michaeli S, Tschudi C. The transcriptome of the human pathogen Trypanosoma brucei at single-nucleotide resolution. PLoS Pathog. 2010;6:e1001090.View ArticlePubMedPubMed CentralGoogle Scholar
- Hug M, Hotz HR, Hartmann C, Clayton CE. Hierarchies of RNA processing signals in a trypanosome surface antigen mRNA precursor. Mol Cell Biol. 1994;14:7428–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Vassella E, Braun R, Roditi I. Control of polyadenylation and alternative splicing of transcripts from adjacent genes in a procyclin expression site: a dual role for polypyrimidine tracts in trypanosomes? Nucleic Acids Res. 1994;22:1359–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Matthews KR, Tschudi C, Ullu E. A common pyrimidine-rich motif governs trans-splicing and polyadenylation of tubulin polycistronic pre-mRNA in trypanosomes. Genes Dev. 1994;8:491–501.View ArticlePubMedGoogle Scholar
- Ullu E, Matthews KR, Tschudi C. Temporal order of RNA-processing reactions in trypanosomes: rapid trans splicing precedes polyadenylation of newly synthesized tubulin transcripts. Mol Cell Biol. 1993;13(1):720–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Clayton C, Michaeli S. 3′ processing in protists. Wiley interdisciplinary reviews RNA. 2011;2:247–55.View ArticlePubMedGoogle Scholar
- Fadda A, Ryten M, Droll D, Rojas F, Färber V, Haanstra J, Bakker B, Matthews K, Clayton C. Transcriptome-wide analysis of mRNA decay reveals complex degradation kinetics and suggests a role for co-transcriptional degradation in determining mRNA levels. Mol Microbiol. 2014;94:307–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Haanstra J, Stewart M, Luu V-D, van Tuijl A, Westerhoff H, Clayton C, Bakker B. Control and regulation of gene expression: quantitative analysis of the expression of phosphoglycerate kinase in bloodstream form Trypanosoma brucei. J Biol Chem. 2008;283:2495–507.View ArticlePubMedGoogle Scholar
- Clayton CE. Networks of gene expression regulation in Trypanosoma brucei. Mol Biochem Parasitol. 2014;195(2):96–106.View ArticlePubMedGoogle Scholar
- Bakker BM, Krauth-Siegel RL, Clayton C, Matthews K, Girolami M, Westerhoff HV, Michels PAM, Breitling R, Barrett M. The silicon trypanosome. Parasitology. 2010;137:1333–41.View ArticlePubMedGoogle Scholar
- Jensen BC, Ramasamy G, Vasconcelos EJ, Ingolia NT, Myler PJ, Parsons M. Extensive stage-regulation of translation revealed by ribosome profiling of Trypanosoma brucei. BMC Genomics. 2014;15:911.View ArticlePubMedPubMed CentralGoogle Scholar
- Vasquez JJ, Hon CC, Vanselow JT, Schlosser A, Siegel TN. Comparative ribosome profiling reveals extensive translational complexity in different Trypanosoma brucei life cycle stages. Nucleic Acids Res. 2014;42:3623–37.View ArticlePubMedPubMed CentralGoogle Scholar
- Ay A, Arnosti D. Mathematical modeling of gene expression: a guide for the perplexed biologist. Crit Rev Biochem Mol Biol. 2011;46(2):137–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Miller C, Schwalb B, Maier K, Schulz D, Dümcke S, Zacher B, Mayer A, Sydow J, Marcinowski L, Dölken L, et al. Dynamic transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol Systems Biol. 2011;7:458.View ArticleGoogle Scholar
- Zeisel A, Köstler W, Molotski N, Tsai J, Krauthgamer R, Jacob-Hirsch J, Rechavi G, Soen Y, Jung S, Yarden Y, et al. Coupled pre-mRNA and mRNA dynamics unveil operational strategies underlying transcriptional responses to stimuli. Mol Systems Biol. 2011;7:529.View ArticleGoogle Scholar
- Manful T, Fadda A, Clayton C. The role of the 5′-3′ exoribonuclease XRNA in transcriptome-wide mRNA degradation. RNA. 2011;17:2039–47.View ArticlePubMedPubMed CentralGoogle Scholar
- Bailey T. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Bailey T, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology: 1994. Menlo Park: AAAI Press; 1994. p. 28–36.Google Scholar
- Jha B, Fadda A, Merce C, Mugo E, Droll D, Clayton C. Depletion of the trypanosome pumilio domain protein PUF2 causes transcriptome changes related to coding region length. Eukaryot Cell. 2014;13:664–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Mulindwa J, Fadda A, Merce C, Matovu E, Enyaru J, Clayton C. Methods to determine the transcriptomes of trypanosomes in mixtures with mammalian cells: the effects of parasite purification and selective cDNA amplification. PLoS Negl Trop Dis. 2014;8:e2806.View ArticlePubMedPubMed CentralGoogle Scholar
- Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang J, van der Ploeg L. Maturation of polycistronic pre-mRNA in Trypanosoma brucei: analysis of trans splicing and poly(A) addition at nascent RNA transcripts from the hsp70 locus. Mol Cell Biol. 1991;11:3180–90.View ArticlePubMedPubMed CentralGoogle Scholar
- McAndrew M, Graham S, Hartmann C, Clayton CE. Testing promoter activity in the trypanosome genome: isolation of a metacyclic-type VSG promoter, and unexpected insights into RNA polymerase II transcription. Exp Parasitol. 1998;90:65–76.View ArticlePubMedGoogle Scholar
- Coustou V, Biran M, Breton M, Guegan F, Riviere L, Plazolles N, Nolan D, Barrett M, Franconi J-M, Bringaud F. Glucose-induced remodeling of intermediary and energy metabolism in procyclic Trypanosoma brucei. J Biol Chem. 2008;283:16342–54.View ArticlePubMedGoogle Scholar
- Lamour N, Riviere L, Coustou V, Coombs GH, Barrett MP, Bringaud F. Proline metabolism in procyclic Trypanosoma brucei is down-regulated in the presence of glucose. J Biol Chem. 2005;280:11902–10.View ArticlePubMedGoogle Scholar
- Droll D, Minia I, Fadda A, Singh A, Stewart M, Queiroz R, Clayton C. Post-transcriptional regulation of the trypanosome heat shock response by a zinc finger protein. PLoS Pathog. 2013;9:e1003286.View ArticlePubMedPubMed CentralGoogle Scholar
- van Dijk EL, Jaszczyszyn Y, Thermes C. Library preparation methods for next-generation sequencing: tone down the bias. Exp Cell Res. 2014;322(1):12–20.View ArticlePubMedGoogle Scholar
- Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, Pizarro A, Kim J, Irizarry R, Thomas RS, et al. IVT-seq reveals extreme bias in RNA sequencing. Genome Biol. 2014;15(6):R86.View ArticlePubMedPubMed CentralGoogle Scholar
- Alberti A, Belser C, Engelen S, Bertrand L, Orvain C, Brinas L, Cruaud C, Giraut L, Da Silva C, Firmo C, et al. Comparison of library preparation methods reveals their impact on interpretation of metatranscriptomic data. BMC Genomics. 2014;15:912.View ArticlePubMedPubMed CentralGoogle Scholar
- Torri AF, Bertrand KI, Hajduk SL. Protein stability regulates the expression of cytochrome c during the developmental cycle of Trypanosoma brucei. Mol Biochem Parasitol. 1993;57:305–16.View ArticlePubMedGoogle Scholar
- Schott J, Reitter S, Philipp J, Haneke K, Schafer H, Stoecklin G. Translational regulation of specific mRNAs controls feedback inhibition and survival during macrophage activation. PLoS Genet. 2014;10(6):e1004368.View ArticlePubMedPubMed CentralGoogle Scholar
- Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB. Rate-limiting steps in yeast protein translation. Cell. 2013;153(7):1589–601.View ArticlePubMedPubMed CentralGoogle Scholar
- Siegel T, Tan K, Cross G. Systematic study of sequence motifs for RNA trans splicing in Trypanosoma brucei. Mol Cell Biol. 2005;25:9586–94.View ArticlePubMedPubMed CentralGoogle Scholar
- Lopez-Estrano C, Tschudi C, Ullu E. Exonic sequences in the 5′ untranslated region of alpha-tubulin mRNA modulate trans splicing in Trypanosoma brucei. Mol Cell Biol. 1998;18:4620–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Hartmann C, Hotz H-R, McAndrew M, Clayton C. Effect of multiple downstream splice sites on polyadenylation in Trypanosoma brucei. Mol Biochem Parasit. 1998;93:149–52.View ArticleGoogle Scholar
- Stern M, Gupta S, Salmon-Divon M, Haham T, Barda O, Levi S, Wachtel C, Nilsen T, Michaeli S. Multiple roles for polypyrimidine tract binding (PTB) proteins in trypanosome RNA metabolism. RNA. 2009;15:648–65.View ArticlePubMedPubMed CentralGoogle Scholar
- Gupta SK, Chikne V, Eliaz D, Tkacz ID, Naboishchikov I, Carmi S, Waldman Ben-Asher H, Michaeli S. Two splicing factors carrying serine-arginine motifs, TSR1 and TSR1IP, regulate splicing, mRNA stability, and rRNA processing in Trypanosoma brucei. RNA Biol. 2014;11(6):751–31.View ArticleGoogle Scholar
- Gupta SK, Kosti I, Plaut G, Pivko A, Tkacz ID, Cohen-Chalamish S, Biswas DK, Wachtel C, Waldman Ben-Asher H, Carmi S, et al. The hnRNP F/H homologue of Trypanosoma brucei is differentially expressed in the two life cycle stages of the parasite and regulates splicing and mRNA stability. Nucleic Acids Res. 2013;41:6577–94.View ArticlePubMedPubMed CentralGoogle Scholar
- Naguleswaran A, Gunasekera K, Schimanski B, Heller M, Hemphill A, Ochsenreiter T, Roditi I. Trypanosoma brucei RRM1 is a nuclear RNA-binding protein and modulator of chromatin structure. MBio. 2015;6(2):e00114.View ArticlePubMedPubMed CentralGoogle Scholar
- Li GW, Burkhardt D, Gross C, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell. 2014;157(3):624–35.View ArticlePubMedPubMed CentralGoogle Scholar
- Hu W, Sweet T, Chamnongpol S, Baker K, Coller J. Co-translational mRNA decay in Saccharomyces cerevisiae. Nature. 2009;461:225–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Pelechano V, Wei W, Steinmetz LM. Widespread Co-translational RNA Decay Reveals Ribosome Dynamics. Cell. 2015;161(6):1400–12.View ArticlePubMedGoogle Scholar
- Huch S, Nissan T. Interrelations between translation and general mRNA degradation in yeast. Wiley interdisciplinary reviews RNA. 2014;5(6):747–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Archer S, Inchaustegui D, de Queiroz R, Clayton C. The cell-cycle regulated transcriptome of an early-branching eukaryote. PLoS One. 2011;6:e18425.View ArticlePubMedPubMed CentralGoogle Scholar
- Ingolia N, Ghaemmaghami S, Newman J, Weissman J. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324:218–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Lukes J, Skalicky T, Tyc J, Votypka J, Yurchenko V. Evolution of parasitism in kinetoplastid flagellates. Mol Biochem Parasitol. 2014;195(2):115–22.View ArticlePubMedGoogle Scholar
- Calderano SG, Drosopoulos WC, Quaresma MM, Marques CA, Kosiyatrakul S, McCulloch R, Schildkraut CL, Elias MC. Single molecule analysis of Trypanosoma brucei DNA replication dynamics. Nucleic Acids Res. 2015;43(5):2655–65.View ArticlePubMedPubMed CentralGoogle Scholar
- Deneke C, Lipowsky R, Valleriani A. Complex degradation processes lead to non-exponential decay patterns and age-dependent decay rates of messenger RNA. PLoS One. 2013;8:e55442.View ArticlePubMedPubMed CentralGoogle Scholar