Widespread variation in transcript abundance within and across developmental stages of Trypanosoma brucei

Background Trypanosoma brucei, the causative agent of African sleeping sickness, undergoes a complex developmental cycle that takes place in mammalian and insect hosts and is accompanied by changes in metabolism and cellular morphology. While differences in mRNA expression have been described for many genes, genome-wide expression analyses have been largely lacking. Trypanosomatids represent a unique case in eukaryotes in that they transcribe protein-coding genes as large polycistronic units, and rarely regulate gene expression at the level of transcription initiation. Results Here we present a comprehensive analysis of mRNA expression in several stages of parasite development. Utilizing microarrays that have multiple copies of multiple probes for each gene, we were able to demonstrate with a high degree of statistical confidence that approximately one-fourth of genes show differences in mRNA expression levels in the stages examined. These include complex patterns of gene expression within gene families, including the large family of variant surface glycoproteins (VSGs) and their relatives, where we have identified a number of constitutively expressed family members. Furthermore, we were able to assess the relative abundance of all transcripts in each stage, identifying the genes that are either weakly or highly expressed. Very few genes show no evidence of expression. Conclusion Despite the lack of gene regulation at the level of transcription initiation, our results reveal extensive regulation of mRNA abundance associated with different life cycle and growth stages. In addition, analysis of variant surface glycoprotein gene expression reveals a more complex picture than previously thought. These data provide a valuable resource to the community of researchers studying this lethal agent.

pant antigenic variation in which hundreds of variant surface glycoproteins (VSGs) are sequentially displayed on the parasite surface [1]. Additionally, this group of organisms, along with their relatives Trypanosoma congolense and Trypanosoma vivax, have hampered the development of sub-Saharan Africa by their severe effects on cattle and draft animals.
In both the mammalian host and the insect vector (tsetse fly), T. brucei undergoes multiple developmental changes that are reflected by changes in morphology, surface proteins, cell division, and metabolism, although all stages are extracellular and motile, courtesy of a single flagellum. When first injected into the mammalian host by the bite of a tsetse fly, the stationary metacyclic phase parasites reenter the cell cycle as rapidly dividing slender bloodstream forms (BF). This stage relies on glycolysis for the generation of ATP, and has a single tubular mitochondrion. After several days, short stumpy forms appear. These forms are arrested in G 0 /G 1 and still express VSG, but show modest metabolic changes that appear to presage the next stage of parasite development [2][3][4]. When taken up by the fly, they readily transform into procyclic forms (PF) that multiply in the insect midgut. This stage of the parasite expresses a different surface coat, made up of a small family of proteins termed procyclins [5]. Glycolysis is down-regulated, amino acid metabolism is induced, and the mitochondrion enlarges and more elaborate cristae develop. After one to two weeks, the PFs journey to the salivary glands where they transform into epimastigotes and then into mammalian-infective metacyclic forms [6]. While T. brucei slender BFs and PFs are routinely cultured in vitro, epimastigotes and metacyclic forms are not readily available.
In contrast to most other organisms, trypanosomatids do not regulate gene expression at the transcriptional level, except for the major surface antigens of African trypanosomes such as T. brucei [7]. Almost all genes are transcribed as large polycistronic clusters, but adjacent genes may show widely differing levels of mRNA, and these levels may vary during the parasite life cycle. This developmental regulation of mRNA abundance is largely attained via changes in decay rate, predominantly attributed to sequences within the 3' untranslated region (UTR) [8]. Early studies of changes in gene expression between slender BF and PF suggested that only about 2% of probes showed significant changes in signal between stages [9,10]. However, these initial studies relied on arrays that utilized relatively large anonymous fragments of the genome as probes (~2.25 kb), and used a cutoff of 2.5fold change between minimum and maximum signal. The elucidation of the complete genome of T. brucei strain 927 in 2005 [11] affords the opportunity to examine global changes in mRNA expression using a gene-specific approach. A recent study examined a subset of about 550 genes for their expression in BF as compared to PF parasites [12]. The genes were selected on the basis of their potential for regulation, based on analysis of the literature, and many were found to be stage-regulated. More recently, microarrays have been used to search for changes in gene expression following knockdown or knockout of genes encoding RNA binding proteins [13,14].
Until now, the microarray studies performed on T. brucei have been either sub-genomic or have utilized a single probe for each gene. In this study, we have used Nimblegen microarrays that employed eight probes per gene to assess gene expression in T. brucei 927, examining the stages which are readily obtained in the laboratory. Those stages included in vitro cultured slender BF (cBF), as well as slender and stumpy BF harvested from infected rats. We also studied both log and stationary phase PFs (PF-log, PF-stat). Our results revealed that mRNA expression in cBF was little different from that seen in the slender BFs harvested from rats. A modest number (~100) of changes were seen in stumpy BF. As expected, many changes were observed when cBF and PF-log parasites were compared, with several hundred genes showing at least a 2-fold change in mRNA abundance. Surprisingly, comparison of PF-log and PF-stat parasites showed an even larger number of changes. The arrays not only allowed us to compare differences between stages, but also to examine expression levels of various genes within a sample. About half of genes ranked in the top 10% of signals in cBF were also expressed to high levels in all other stages. A significant fraction of these highly expressed genes encoded hypothetical proteins. Very few genes showed no evidence of expression.

Results and Discussion
This study reports our findings on changes in gene expression between those stages of T. brucei that can be readily studied in the laboratory. During natural infections in the vertebrate host, T. brucei progresses from the actively dividing slender BF to the non-dividing stumpy BF. This form is primed for differentiation to procyclic forms in the insect host. We examined the expression level of nearly all predicted protein-coding genes of strain 927 T. brucei, as well as many RNA genes, in five different parasite populations. Most of the analyses reported here are restricted to nuclearly encoded transcripts, although some findings for mitochondrially derived transcripts are noted. Three biological replicates were used for each test condition. Three of the five sets were derived from in vitro culture under highly standardized conditions: log-phase cultured bloodstream forms (cBF), log-phase cultured procyclic forms (PF-log), and stationary-phase procyclic forms (PFstat) (see Additional file 1). The remaining two samples, slender BF and stumpy BF, were derived from infected rats and were likely to show more inter-sample variability, since they were grown for varying times in rats, and the animals became progressively less healthy after irradiation and infection. Since the stumpy BF populations were the last to be harvested and each contained a variable proportion of intermediately differentiated forms, these populations were the most biologically variable. They ranged from 76% to 93% morphologically stumpy and 68% to 88% of cells expressed the PF marker EP procyclin upon appropriate stimulation (see Table 1). The slender BF populations showed less than 5% intermediate or stumpy forms and less than 1% expressed procyclin after stimulation.
The Nimblegen arrays that were hybridized with cDNA prepared from each parasite population contained multiple probes per gene (in most cases, eight), and three copies of these probe-sets per chip. After normalization to allow for cross-chip comparisons, we calculated a single value for each set of triplicate probes using Tukey's biweight formula. We then obtained a single gene level value using the Tukey biweight of the signals for the probes corresponding to each gene (Additional file 2). In most cases, this value represented the "average" of 72 data points collected for every protein-coding gene (45 for RNA genes) in each of the five growth conditions. Thus, as detailed in Methods, we were able to utilize robust statistical analyses that increase our confidence in the findings of the comparisons reported below.

Gene expression levels
The normalized expression values from the 8110 probesets corresponding to nuclear genes were hierarchically clustered using the TMeV software package and are shown graphically as a heat map in Figure 1A. A small number of genes (those colored black or dark blue in the middle of the figure) showed no or very low expression under any condition, and a similar number (colored red) showed high expression levels in some (top) or all (bottom) conditions. However, the large majority of the genes showed low (blue) or moderate (green or yellow) expression levels. Each biological sample set showed a similar distribution of signal intensities obtained for the 8004 probe-sets corresponding to nuclear protein-coding genes, with the curves obtained from cBF and PF-stat (the most divergent samples) shown Figure 1B. This figure also depicts the signals obtained for the 345 probe-sets that map to multicopy CDSs. This curve was skewed dramatically to the right, as compared to the total sample of probe-sets, confirming gene amplification as a clear strategy for increasing gene expression in T. brucei. Indeed, 20 of the 50 probe-sets detecting the highest signals map to two or more genes (see below and Table 2). However, the signals did not follow in rank order of number of genes detected by the probe-sets.
For the total probe-sets, there is a large peak centered at 4600 for the PF-stat and ~5800 for cBF. In both cases, the peak moves sharply down towards lower values, with a small shoulder at ~200. When the same arrays were probed with RNA derived from a different strain, this shoulder was more pronounced, and a corresponding increase in probe-sets with signal intensities less than 200 was observed (unpublished data). Since many of these probe-sets mapped to VSGs, most of which are not conserved between strains, a signal level of ~200 can be taken as a generous estimate of background for non-transcribed regions. However, almost all regions of the T. brucei genome are thought to be constitutively transcribed, and hence even those genes whose mRNAs are unstable would likely show a signal higher than this background. The number of probe-sets that failed to show a signal level of less than four times the "non-transcribed background" signal (i.e., 800) in least one of the five different biological conditions was small (164 protein-coding genes). Even then, VSG genes, which are subject to clonal variations in expression, accounted for all but 65 of this set. The majority (55) of the remaining genes have unknown function, with most of these (44) found only in T. brucei, raising the possibility that they do not represent authentic genes. In addition, almost all of these low-expressing genes are located in sub-telomeric clusters of VSGs or expression site associated genes (ESAGs) or are immediately adjacent to convergent strand-switch regions where transcription terminates [15,16]. Even when considering only cBF parasites, only 133 predicted non-VSG protein-coding genes  had a signal <800, and of these only 22 had annotated function. Notably, most of these genes were expressed to higher levels in at least one other stage, and those that were low in all stages were located in a sub-telomeric or convergent strand-switch region context.
The protein-coding genes were ranked according to their maximal expression level in any stage, and broadly categorized according to their annotated function ( Figure 1C). As suggested above, VSG genes and T. brucei-specific genes of unknown function accounted for most of the genes showing the lowest expression (86% of the bottom 2% and 64% of the bottom 5%). Conversely, genes which are more highly expressed tend to have some type of functional annotation; indeed many of these have been studied experimentally.
We identified the nuclear CDSs with the top 10% of signals for each biological condition. The genes were individually examined and placed into categories based on their annotation and/or their proteomic detection in specific sub-cellular fractions. Figure 2A shows the distribution of these genes into broad categories as in Figure 1C, while Figure 2B shows the distribution of genes with ascribed function (yellow in Figure 2A) according to various categories of biological function or location for the each of the five different biological conditions. As indicated in Figure  1C, above, genes with unknown function were under-represented in the highly expressed category, as compared to the whole genome, and this was most striking in the PFlog cells (Figure 2A). Overall, the categories with the largest numbers of highly expressed genes were translation, metabolism, or cytoskeleton, with the majority (99 out of 144) of the genes in the translation category encoding cytoplasmic ribosomal proteins. Genes involved in mitochondrial function were highly represented in the top expressors in the PF stages, but more modestly represented in BF stages. As expected, significantly higher proportions of ESAGs or genes related to them (GRESAGs) were expressed at high levels in all BF stages than in the PF stages. In general, relatively similar proportion of genes were found in each category in all biological stages, although the number of genes involved in metabolism was increased in PF-log and genes encoding cytoskeleton proteins decreased in stumpy BF. However, this does not mean that the same genes are expressed to high levels in all stages. For example, 13% of mRNAs highly expressed in cBF were more than 2-fold up-regulated as compared to PF and similarly 12% of mRNAs highly expressed in PF were more than 2-fold up-regulated as compared to cBF (see below).
Among the 800 most highly expressed nuclear CDSs in cBF, 307 were highly expressed in every condition examined. Surprisingly, 111 of these had unknown function. Thus, there is a large set of highly expressed T. brucei-specific (20) or conserved (91) genes that have not been ascribed a function; including nine of the 50 most highly expressed genes in cBF (see Table 2). Some of the "hypothetical" proteins encoded by these nine genes have been shown to exist by proteomic analyses (Tb11.01.2800, Tb10.6k15.1500) or other studies (Tb927.1.4600 [13]). Conversely, two other "hypothetical genes" lie in intergenic regions between coding regions that are highly expressed, raising the possibility that these are not separate genes, but rather simply sequences within 3' UTRs of  Figure 1C. B. Further breakdown of functional category. Only those categories with at least ten genes at one condition are shown. The functional (blue bar) and location (red bar) categories are indicated below the X-axis. the neighboring genes. The first, as represented by Tb927.1.2540, is one of a set of seven almost identical putative genes (six of which are annotated as "hypothetical protein, unlikely") that are interspersed between the histone H3 genes. The second, as represented by Tb927.1.4590, corresponds to a set of genes that are interspersed between the highly expressed CFB1 genes. Tb11.1190 encodes a putative protein which is composed of 49 repeats of 68 amino acids and is represented on the microarray by a single probe corresponding to a unique sequence at the C-terminus. Tb11.02.4690 specifies a 22 kDa protein with a signal sequence and three transmembrane domains. It is expressed to a much higher level than the flanking genes, indicating it is a distinct mRNA. The last of these nine genes, Tb927.4.1000, encodes a 25 kDa protein that is also expressed much more highly than the adjacent genes.

Differential gene expression
Comparison of the Tukey mean maximum and minimum signal levels for all probe-sets corresponding to nuclear genes revealed 122 genes that showed a greater than 10fold change between two or more of the five biological conditions tested. Of these, 30 were VSG, VSG-related (VR) genes, or ESAG/GRESAGs, many of which are associated with antigenic variation. A total of 446 genes (including 161 VSGs and ESAGs) showed more than 4-fold variation, while at the 2-fold level, 2105 genes (including 233 VSGs and ESAGs) showed a statistically supported difference in expression. Thus, over one-fourth of all of the genes assessed on the microarray were differentially expressed (i.e. q-value of <5% in multi-class significance analysis of microarrays (SAM)) between at least two conditions. This dataset was further reduced to those showing a >2-fold deviation from the mean of all five conditions in at least one sample, and by excluding all genes encoding VSG/VRs and ESAG/GRESAGs (which were analyzed separately, see below). These 534 probe-sets were K-median clustered using settings indicated in the Methods to yield nine clusters which contained between 31 and 80 genes ( Figure 3 and Additional file 3). Each of these clusters represents a distinct pattern of gene expression, although some are similar. Overall, these highly regulated genes are enriched in those involved in metabolism, proteolysis, translation and T. brucei-specific unknown functions, but under-represented in those genes that are conserved but have unknown function. However, individual clusters show differential enrichment in particular functional gene categories. Figure 3A shows the heatmap depicting gene clustering, while Figure 3B shows overlaid expression patterns graphically for every gene in each cluster. Figure 3C depicts specific examples that illustrate the expression patterns characteristic of the gene clusters. Genes in clusters 5, 6, 7 and 8 all have higher mRNA levels in BF than in PF, although the clusters each show subtle differences in expression pattern. For example, cluster 8 contains genes that encode mRNAs substantially down-regulated in PFlog and PF-stat, whereas in cluster 6 the down-regulation in PF is more modest and some genes begin to decrease expression in stumpy BF. Cluster 8 is enriched in genes involved in metabolism and adenosine transport and also includes several genes encoding 64-65 kDa invariant surface glycoproteins (ISGs) and procyclin-associated genes (PAGs). Cluster 6 contains genes encoding three 75 kDa ISGs, several proteases, and a substantial number of T. brucei-specific proteins of unknown function. Cluster 7 contains genes that are somewhat down-regulated in PFlog (relative to BF), but are expressed at even lower levels in PF-stat. Genes encoding proteins involved in interaction, metabolism, protein folding and protein transport or modification, are overrepresented in this cluster. Conversely, cluster 5 contains 60 genes that are down-regulated to a greater extent in PF-log than in PF-stat. This cluster is dominated by genes encoding proteases and T. brucei-specific proteins with unknown function. The former category includes several paralogues encoding homologues of the Leishmania gp63 surface protease.
Clusters 2, 3, and 4 show different patterns of up-regulation with respect to the two PF biological conditions. Cluster 2 contains genes that are up-regulated in PF-log, but not in PF-stat. For some genes this change in expression begins in stumpy BFs. The genes in this cluster are over-represented for those encoding proteins involved in interaction, metabolism, RNA processing, transcription, translation and cytoskeleton function. Their reduced expression in PF-stat is consistent with cessation of growth functions upon entry into stationary phase. Indeed we observed a significant accumulation of rRNA precursors in the PF-stat samples upon analysis on the Agilent BioAnalyzer (not shown). Conversely, cluster 3 contains genes that are up-regulated only in PF-stat and mostly have unknown function, including eight that are T. brucei-specific. It is possible that some of these gene products are involved in preparation for differentiation into epimastigotes, the next stage in the parasite life cycle. Finally, cluster 4 contains genes with higher expression levels in both PFlog and PF-stat and is enriched in genes encoding proteins involved in metabolism, proteolysis, or with unknown function but located on the cell surface or mitochondrion. As discussed in more detail below, this is consistent with the switch to mitochondrial pathways for energy generation in procyclics.
Cluster 1 contains the largest number of genes, with 82 members. The expression of these genes is lower in PF-stat (and stumpy BF in some cases), but the genes are expressed at higher levels in cBF and slender BF than seen in cluster 2. These genes are over-represented in those involved in metabolism, DNA replication/repair, protein folding, proteolysis and translation, consistent with their down-regulation in stationary-phase cells. Cluster 9, with 31 members, is the smallest cluster. The pattern of gene regulation is similar to cluster 1, but with some up-regulation in PF-log. Like cluster 1, this cluster is enriched in genes involved in DNA replication/repair.
The existence of these varied expression patterns implies a complex set of regulatory mechanisms operating at the RNA level to control the abundance of transcripts encoded by nuclear genes. The specific proteins involved in these processes are only beginning to be examined (see for example refs. [13,14,17]).
In contrast to the analyses above that examined the transcripts showing the most variation in abundance, we also looked at the transcripts that showed the least variation. Genes such as these would provide excellent controls for studies of developmental changes in gene expression. We identified 830 genes with a maximum variation in expression between the five stages of <25% (see Additional file 2). As expected, genes encoding proteins involved in Cluster analysis of highly-regulated mRNAs Figure 3 Cluster analysis of highly-regulated mRNAs. The 534 genes showing two-fold deviation from the mean expression value in at least one of the 5 samples using multi-class SAM analysis were analyzed. ESAGs, GRESAGs, VSGs, and VRs were excluded from the analysis. A. Heatmap. The genes were subjected to K-median clustering of the log 2 ratios into 9 clusters, as numbered. B. Signals corresponding to individual probe-sets within each cluster. For each probe-set, the signal for each biological condition was compared to the mean of those signals across conditions (log 2 ratio). The biological conditions from left to right are: cBF, slender BF, stumpy BF, PF-log and PF-stat. C. Tukey mean expression values across biological conditions for a single gene from each cluster, as numbered.  known stage-regulated processes such as glycolysis and electron transport are significantly under-represented. However, the group is slightly enriched for genes of unknown function. Genes encoding proteins involved in lipid or fatty acid metabolism are also over-represented, comprising 28% of the metabolic enzymes that showed little variation as opposed to 11% of all metabolic enzymes. Similarly, genes encoding proteins of the ubiquitin pathway represent 40% of all protease-related genes, but 85% of the subset of protease-related genes that showed little variation. Genes involved in histone acetylation or chromatin structure, such as Tb927.7.1690 and Tb927.4.2520 (which encode transcriptional silencer Sir2) also tended to maintain similar mRNA levels between life cycle stages.

Comparison of cBF and log phase PF forms
In order to identify differences in gene expression between specific conditions, we conducted pair-wise comparisons of specific datasets (including cBF versus slender BF, slender BF versus stumpy BF, cBF versus PF-log, and PF-log versus PF-stat) using SAM, setting the q-value to <5% and the fold-change to >2 (see Methods). Because VSG expression is both clonal and highly variable, VSGs are excluded from the gene tallies below, unless otherwise noted.
Comparing the signals between cBF and PF-log, 691 genes were found to be differentially expressed. When the stringency of the SAM analysis was reduced to a 1.7-fold change, 963 genes were detected. A further reduction to 1.5-fold identified 1508 genes--approximately 19% of the genome. Thus, a relatively large fraction of the genome encodes mRNAs that differ in abundance between these two stages. Figure 4A shows a comparison of the functional categories of the genes showing >2-fold regulation; these are individually listed (along with their fold-changes in mRNA expression and q-values) in Additional file 4. Table 3 itemizes those genes upregulated in cBF that have predicted functions (excluding VSGs and ESAGS, which are discussed below).
As can be seen in Figure 4A, categories of genes where cBF show higher expression than PF-log cells include ESAGs and GRESAGs, uncharacterized proteins bearing interaction motifs (such as zinc fingers and leucine-rich repeats), and known surface and secreted proteins (in part because there are multiple distinct genes in several surface protein families). However, it is also interesting that a larger number of genes upregulated in cBF encode proteins with hypothetical status (conserved or T. brucei-specific) that have signal sequences. This fits well with the finding that the secretory and endocytic systems are more active in BF than PF [18]. However, unlike Koumandou et al. [12], we did not find that mRNAs specifying proteins involved in secretory traffic were highly up-regulated in BF.
Categories with more representatives up-regulated in PFlog cells include those encoding mitochondrial proteins, metabolic proteins, and translation. It is known that the metabolism of PF (which can use both glucose and amino acids for energy metabolism) is more complex than BF (which are highly glycolytic) [19], presumably accounting for the more diverse set of metabolic genes up-regulated in PF-log cells. In PF, the mitochondrion becomes enlarged with more fully developed cristae and the respiratory chain is active [2]. These changes are reflected in the increased expression of a large number of genes (72) encoding products associated with the mitochondrion, including 27 that are of unknown function. In contrast, a single gene known to encode a mitochondrial protein is upregulated in cBF: the alternative oxidase. This oxidase is required for the glycerophosphate shuttle that allows glycolysis to continue [20].
The comparison of mRNA abundance between these two stages led to the identification of several groups of interesting genes. These include those encoding nucleoside transporters NT2-NT7 which reside in an array immediately adjacent to the sub-telomeric VSG cluster at the "right" end of chromosome 2. There they alternate with a set of iron-ascorbate oxidoreductase genes (see Tb927.2.6180, Tb927.2.6230, Tb927.2.6310) that have not been functionally characterized to our knowledge.
The NT genes were reported to be more highly expressed in BF than PF forms, a finding which we also observe [21]. Interestingly, all of these oxidoreductase genes are also significantly more highly expressed in cBF than PF-log (3.3-13.2-fold, see Table 3). Three additional iron/ascorbate oxidoreductase genes are found on chromosomes 5, 7, and 9 --these are each expressed to similar levels in cBF and PF-log. Thus, the chromosome 2 region represents a rare cluster of genes encoding similarly regulated mRNAs. Another interesting case is that of VSP1, an acidocalcisomal pyrophosphatase encoded by two tandemly linked genes (Tb11.02.4910 and Tb11.02.4930) with almost identical coding regions [22]. The array data show that the two genes are reciprocally regulated, which may potentially be traced to their divergent 3' UTRs.

Comparison of gene expression in BF under different conditions
We compared the expression of all nuclear genes in slender BF isolated from infected animals with the expression in slender BF obtained by in vitro culture (cBF) and the expression in stumpy BF from animals. In comparison of cBF and slender BF, other than a few VSG genes, no gene showed a difference in expression that met our criteria of a 2-fold change and q-value < 5%. Additional file 5 lists those genes that showed more moderate (>1.5-fold) or less well-supported changes (q-value < 15). However, two ISG64 genes (Tb927.5.1390 and Tb927.5.1430) showed a slightly lower (1.6-1.8-fold), but high-confidence increase in signal in slender BF. A few other genes showed similar (1.5-1.9-fold) changes in expression, but had somewhat lower confidence (q-value = 7.9). These included a CAMK group protein kinase (Tb927.7.6580), a nucleoside phosphorylase (Tb927.8.4430), a tryparedoxin (Tb927.3.5090) and two proteins with unknown function that had higher signals in cBF, and a GRESAG4 that had a higher signal in slender BF. These data contrast with a previous microarray study examining 550 genes that found 35 were upregulated in cBF and 3 were upregulated in slender BF [12]. None of those 38 genes correspond to the few genes that we identified above. Two sets of genes that we identified as modestly upregulated were on the previous array, but these were not observed to be upregulated in that analysis. The lack of consistency between the two studies in this regard could arise from differences in the strains or conditions (e.g., medium, serum, use of intact vs immunocompromised animals). Nonetheless both stud- a ESAGs, GRESAGS, VSGs are not included. A complete list of the genes is found in Additional file 3. b Bold: gene was upregulated in all BF vs all PF conditions. Probe-sets detecting >1 gene indicated by "+", plus number of additional genes. * marks known stage-regulated genes used in previous array study [12]. Pairwise comparison of gene expression in different biological conditions A comparison of the rapidly dividing slender BF with the non-dividing stumpy BF showed a total of 107 genes with at least a 2-fold change in signal in the arrays, not including VSGs. About twice as many genes were up-regulated in slender forms (Table 4) as were up-regulated in stumpy forms ( Table 5). The most prominent categories of genes showing increased signals in slender forms are those that are related to the cytoskeleton, including the flagellum ( Figure 4B). Many of these genes are annotated as hypo-thetical proteins, but they were detected in the flagellar proteome [23]. Non-dividing forms do not build new flagella or cytoskeleton. Additionally, several metabolic enzymes were up-regulated, predominantly those which are localized to the glycosome (a specialized peroxisome) or are involved in glycolysis. Conversely, the entire set of eight ESAG9 genes in the 927 strain were upregulated in stumpy BF (from 2-fold to 30-fold), as were two genes that are related to ESAG9. The function of ESAG9 is not known; it was originally described as a gene found in a VSG expression site (ES) in the closely related parasite Trypanosoma equiperdum [24]. At that time, the authors noted that a related ESAG9 was transcribed independently of the VSG ES. Seven of the eight annotated ESAG9 genes encode proteins with a predicted signal sequence, but none of these contain predicted transmembrane domains, suggesting the ESAG9s could encode a family of secreted proteins. The metabolic enzymes encoded by genes with higher mRNA levels in stumpy BF were predominantly mitochondrial, consistent with pre-adaptation for differentiation into insect forms. We also noted the increased mRNA for the PAD1 and PAD2 genes, which encode citrate transporters and were previously shown to be upregulated in stumpy forms of T. brucei strain EATRO 2340 [25].

Comparison of gene expression of PF in different conditions
Unlike cBF, in vitro cultured PF can be grown to stationary phase where they can persist for several days as viable cultures. Thus, we could directly compare the abundance of mRNAs in actively replicating (PF-log) and non-dividing (PF-stat) cells. A total of 895 genes showed differential expression (see Additional file 6), many more than the 107 genes differentially regulated between with the slender (log) versus stumpy (stationary) BF. About three times as many genes were up-regulated in PF-log as compared to PF-stat. As shown in Figure 4C, this increase was reflected across almost all categories of genes, except for proteins categorized as unknown (both conserved and T. bruceispecific). The most skewed group was genes annotated as encoding hypothetical proteins (conserved or T. bruceispecific) that have predicted transmembrane domains -many more such genes were upregulated in stationary phase than in log phase. For proteins with ascribed function, those associated with protein phosphorylation/ dephosphorylation were enriched in stationary phase. As discussed above, some of changes in PF-stat may reflect the decrease in cellular growth functions, or perhaps preparation for development to epimastigotes. It is also possible that some transcripts with higher signals in PF-stat are simply those that decay most slowly.

Genes encoded by the mitochondrial genome
Several genes on the mitochondrial maxicircle genome are extensively remodeled by RNA editing to yield transcripts encoding components of mitochondrial respiratory complexes. Only 15 mitochondrial probe-sets could be designed (see Additional file 7). Four corresponded to both edited and unedited sequences, and six to neveredited sequences, including the two rRNAs. Three corresponded to edited sequences, two of which had corresponding unedited probe-sets. From this limited set, a few trends could be observed, which were compatible with prior literature [26][27][28]. For example, 12S and 9S rRNA, cytochrome b, cytochrome oxidase subunit I, and cyto- chrome oxidase subunit II (edited plus unedited) transcripts all increased in stumpy BF and further increased in PF-log, although some did not reach statistical significance until PF-log phase. Somewhat surprisingly, ATP synthase subunit 6 (edited), NADH dehydrogenase subunit 5, NADH dehydrogenase subunit 7 (edited) and NADH dehydrogenase subunit 8 (edited) all showed increased mRNA levels in stumpy BF, but decreased in PF-log. Unexpectedly, many of the signals reached their maximum in PF-stat. This could reflect a potential differential stability as compared to the nuclearly-encoded transcripts under conditions of growth arrest, and would be highlighted by the normalization procedure.

Gene families
We noted several tandem arrays of gene families containing non-identical genes that were differentially regulated. Three families encoding proteins with multiple transmembrane domains are depicted in Figure 5. The first cluster ( Figure 5A) of genes are those in the recently described PAD array of carboxylate transporters [25]. In contrast to PAD1 and PAD2, which are induced in stumpy BF [25], the other members of this gene family are either constitutively expressed at the mRNA level or more highly expressed in PF. PAD5 and PAD7 show an increase in expression from stumpy forms to PF-log and even higher expression in PF-stat. PAD8 showed similar expression in all conditions except in PF-stat, which was ~1.5-fold increased over PF-log (q-value = 4.89). Figure 5B shows an unrelated gene family on chromosome 10 that also encodes major facilitator proteins. These four genes show a high level of conservation with one another, with long stretches of amino acid identity. Two are most highly expressed in BF, whereas the other two show a more complex pattern of regulation. The final set of genes ( Figure  5C) encodes a set of related proteins predicted to have four to five transmembrane domains, four of these genes are tandemly arrayed on chromosome 8. Here the mRNA abundances of the three most closely related genes are higher in the BF samples. In contrast, the first gene in the array and another more divergent, unlinked gene on chromosome 11 do not show this pattern, and have similar or higher expression in PF.

VSGs and ESAGs
The T. brucei strain 927 genome contains approximately 1600 VSG genes (or pseudogenes), but each BF trypanosome expresses only a single VSG, which covers the surface of the parasite in a dense coat. Although T. brucei possesses ~20 VSG ESs (located at telomeres of megabaseand intermediate-sized chromosomes) [29], the expressed VSG gene encoding the surface coat protein is located in the sole active ES. In BF, transcription initiates in all ESs, but attenuates rapidly in the inactive ESs, never reaching the downstream genes including the resident VSG [30].   [31]. A relatively small number of apparently functional VSG genes exist on the 11 megabase-sized chromosomes in T. brucei. The minichromosomes also contain a reservoir of apparently functional VSG genes, but only a few have been sequenced. In contrast, most VSG genes reside in sub-telomeric arrays that are comprised of pseudogenes (which were not included on these microarrays) and atypical VSG genes, which encode proteins that are neither clearly pseudogenes nor clearly functional [11]. The pseudogenes provide the fuel for generating novel VSG genes by mosaic gene conversion during antigenic variation, particularly later in infection [32]. The VSG-related VR genes are located not in the telomeric ESs or sub-telomeric arrays, but rather typically reside in chromosome-internal strandswitch regions and lack the 70-bp repeats typically found upstream of VSG genes [11,32]. The telomeric ESs and sub-telomeric VSG arrays also contain hundreds of ESAGs, many of which are pseudogenes. However, a number of genes related to ESAGs (GRESAGs) have chromosomal-internal location (the nomenclature discriminating ESAGs and GRESAGs was not consistently applied as genes were named).
The microarray design used in this study, contained probes for 74 VSGs, 70 atypical VSGs, and 46 VSGs that were unclassified on VSGdb [33]; 21 sub-telomeric ESAGs, 104 chromosome-internal ESAGs and GRESAGs, as well as 17 ESAGs from three T. brucei strain 427 ESs (no T. brucei strain 927 ESs have been annotated to date). This VSG and ESAG subset of genes was represented by a total of 357 probe-sets. Even though individual parasites express only one ES (containing a single VSG and ~10 ESAGs) at a time, since the parasites have been maintained without regard for antigenic type, we expected that there would be diverse set of VSG genes showing some expression at the population level. In addition, we expected that expression of these VSGs and ESAGs would vary between biological replicates, and indeed, a subset of VSG and ESAGs showed considerable variation in BF, but not PF ( Figure 6A), probably reflecting antigenic variation within these populations. Thus, subsequent analyses were carried out on the 15 individual samples rather than on the mean of the biological conditions (see Additional file 8 for gene level data).
Hierarchical clustering of the 357 probe-sets (after log 2transformation of the normalized expression values) allowed us to define four distinct patterns of VSG gene and ESAG expression (marked A-D in Figure 6B). Interestingly, the distribution of VSG genes and ESAGs from different genomic locations within each group differed markedly (see Figure 6B and 6C). Group A contained a large number (137) of VSGs not expressed in any sample, or only at low levels in some BF samples, exemplified by gene 1 in Figure 6D. All these genes were located within sub-telomeric clusters and were likely not transcribed at any stages, except when translocated to the active expression site in small sub-populations of BFs. This group also included five ESAGs from T. brucei 427 ESs that presumably either reside in inactive expression sites or are not present in T. brucei 927.
A second group (B) contained 34 VSG genes and 54 ESAGs, which were expressed at substantially higher (but still relatively moderate) levels in BF and generally low levels in PF. Many of these showed variable expression levels in different biological replicates of the BF samples, indicative of expression from active ESs in sub-populations of BF. This group contained VSG and VR genes from sub-telomeric clusters (genes 2 and 3, in Figure 6D), as well as from chromosomal-internal locations (mostly VRs, e.g. gene 4). It also contained ESAGs and GRESAGs from the 427 ES, sub-telomeric clusters and chromosomal-internal loci. Of particular interest are several ESAG9 genes that are up-regulated only in stumpy BF (as discussed above). While this group of genes has many of the hallmarks of canonical VSG/ESAG expression from ESs, it should be noted that in many cases their signal levels in PF were substantially above background; suggesting that the genes are actively transcribed in PF, but the mRNAs are less stable than in BF.
Unexpectedly, a group (C) of 34 sub-telomeric VSG and nine ESAG genes showed variable expression levels in both BF and PF. The function of these VSG genes is unclear, since they appear to encode both typical and atypical VSGs. In particular, one group of four tandemlylinked VSG genes from an allele-specific region of chromosome 5 showed highest expression in PF-log cells (see gene 5 in Figure 6D). Group C contains both sub-telomeric and internal genes encoding ESAGs 2, 3, 5, and 11. Interestingly, several ESAG11-related genes and a VR are located in a tandem array on chromosome 4, where they are interspersed with genes encoding hypothetical proteins. Since the hypothetical proteins show very similar expression patterns to the adjacent ESAG11-related genes, at least some may simply represent 3' UTRs of the neighboring genes. Interestingly, this cluster is located between rRNA and tRNA gene clusters, and would not be expected to be transcribed, since it appears to lack the modified chromatin found at typical RNA polymerase II transcription initiation sites [15,16]. The signal levels for all these genes is modest (<3000), and are lowest in stationary phase, suggesting that they may merely represent increased "background" transcription due their proximity to the actively transcribed RNA genes. However, this does not rule out functionality of this set of putative genes.
The final group (D) of VSG and ESAG genes were expressed in all life cycle stages. All nine VSG/VR genes in this group are located in chromosomal-internal loci: four are annotated as VRs, four are atypical VSGs and one is uncategorized. mRNA for several of the VR genes has previously been detected in PF using PCR [32]. One of the VRs shows highest expression in PF-stat (gene 7 in Figure  6D). Three of the atypical VSG genes show similar expres-Cluster analysis of ESAG and VSG gene expression sion in all stages (e.g., gene 6 in Figure 6D). These genes, Tb927.4.5400, Tb927.4.5420 and Tb927.4.5430, along with a 4 th gene identical to Tb927.4.5420, are tandemlylinked to form a small cluster just 5' (and on the opposite strand) to the sub-telomeric VSG cluster at the "right" end of chromosome 4. Interestingly, this cluster of genes is immediately downstream of a convergent strand-switch region that appears to contain an RNA polymerase transcription initiation site in both BF and PF [16]. A large number of ESAGs are also expressed in all life cycle stages; of these most are chromosomal-internal GRESAG4 genes that have been shown previously to be expressed in PF [34]. However, two ESAG4s and an ESAG7 from the 427 ES show this expression pattern, as do six sub-telomeric ESAGs (two encode ESAG3, two ESAG5, one ESAG4 and one ESAG9-like). The functional significance of their expression in PF is unknown.
Of the 215 VSG genes examined, 43 showed expression levels above the bottom quartile (~4500) of all genes in at least one BF sample. These included 10 classified as encoding functional VSGs, and eight that were unclassified, but also included eight encoding atypical VSGs and 17 VRs. From these data it is apparent that at least some atypical VSGs are expressed and hence likely to be functional. Indeed a query on GeneDB for VSGs annotated as being detected in proteomic analysis of BF [35] yielded seven genes, two of which are atypical VSGs. Only nine of the 43 VSG genes noted above were expressed below the 5 th percentile (~1300) in PF. These included five encoding typical VSGs (and two that were uncharacterized), but one gene encoded an atypical VSG, and one VR gene also had this expression pattern. Thus, these data suggest that the functional diversity of VSG and VR genes is likely more complex that currently appreciated.

Conclusion
The results obtained in our study for genes previously shown to be differentially expressed in BF versus PF, including those used as controls in a previous microarray analysis [12], allow us to critically assess the validity of our analysis. As expected, mRNAs corresponding to pyruvate kinase 1, metacaspase 3, isoforms of ISG64, ISG65, and ISG75, VSG glycophosphatidyl inositol phospholipase, major surface protease MSP (also known as GP63) isoforms A and C, hexokinase I and phosphoglycerate kinase isoform C all showed increased levels in cBF; while EP2 and EP-3-2 procyclins, a trans-sialidase, phosphoenolpyruvate carboxykinase, cysteine-rich acidic integral membrane protein (CRAM), a flagellar adhesion protein, corset protein 17, and CAP5.5 [36] all showed increased signal in PF-log samples, although the changes generally were less dramatic than those seen on Northern blots and in the previous microarray analyses. Seven genes previously used as differentially expressed controls in [12] showed changes less than 2-fold in our hands. Four showed more modest changes and one (encoding histone H3) did not change between cBF and PF-log, although it was down-regulated in PF-stat. Finally, recent data suggest that two of those genes may not be regulated between these stages [36,37]. Thus, there is good agreement between the current results and previous work. Since eight probes were used for most genes in our study, it is likely that any changes we see are highly reliable. The somewhat muted differences could be due to a lower sensitivity of the microarrays or to strain variations [38].
The data we obtained shows that despite the lack of transcriptional control for most genes, gene expression is finely tuned during T. brucei development. While most of changes in mRNA abundance were relatively modest, some varied by 10-to 100-fold, and over one-fourth changed at least 2-fold between the different conditions. Many of these changes were easy to reconcile with known changes in parasite biology: the transition from dividing to non-dividing forms affects genetic functions such as transcription and translation and the transition from mammalian to insect stages affects metabolism, surface proteins and transporters. A moderate number of genes encoding proteins with functions associated with RNA were differentially expressed, perhaps reflecting the important role these play in post-transcriptional regulation of gene expression. It is perhaps surprising that relatively few genes encoding protein or lipid kinases and phosphatases were highly regulated, although this may reflect a predominant use of phosphorylation pathways to modulate the activities of these enzymes themselves.
A recently published set of analyses of the transcriptomes of the four developmental stages of Trypanosoma cruzi indicated that up to 50% of genes show a statistically significant difference in mRNA levels during parasite development [39], a proportion considerably higher than that which we observed in T. brucei. It is likely that as other stages of the T. brucei life cycle are examined, the proportion of genes determined to show significant regulation during development will rise. Our findings can also be compared to studies in related parasite Leishmania wherẽ 3-10% of the genes showed changes of at least 2-fold between rapidly-growing procyclic promastigotes, nondividing metacyclic promastigotes, and the more slowlygrowing amastigote forms [40][41][42][43][44][45]. While the studies in Leishmania identified a heterogeneous, but overlapping, set of mRNAs showing regulation in abundance, several similarities with the present study emerge. These include down-regulation of genes encoding translational machinery (including tRNAs) in slower growing stages, as well as substantial changes in mRNA levels for many metabolic enzymes.
Interestingly, our results show very little difference between cultured and animal-derived T. brucei BF. Other studies using stresses such as exposure to tunicamycin or reducing agents, reduced serum or genetic manipulation of specific genes showed few changes in RNA abundances, suggesting that BF parasites have little ability to alter their transcriptome following unexpected environmental changes [12]. This contrasts with some reports of substantial differences between axenic and animal-derived amastigotes of Leishmania [46]. It remains to be seen whether this difference is due to the extracellular nature of T. brucei, methodological difficulties of extracting amastigotes from macrophages, or novel environmental cues detected by the Leishmania parasites.
Use of the Nimblegen arrays also allowed us to provide an assessment of the relative abundance of transcripts within a biological condition, over at least two orders of magnitude. While many of the highly expressed transcripts encode well-characterized proteins, there still remain many that have received little or no attention. The data presented here provide an important foundation for researchers interested in elucidating the unusual biology of the parasite or developing new interventions to combat the lethal disease they cause. It is now important to further understand the mechanisms involving regulation of translation, protein activity, and protein turnover to realize the full extent of developmental regulation in T. brucei. This is underscored by recent work comparing changes in the transcriptome and proteome during Leishmania differentiation, which suggests that there is a relatively poor correlation between the two and that translation and protein stability play important roles in regulation of gene expression in trypanosomatids [47](Lahav et al., manuscript in preparation). However, work in T. cruzi suggests a more robust relationship [39], perhaps pointing to different modes of gene regulation between the Leishmania and Trypanosoma genera.

Parasites and RNA preparation
The pleiomorphic T. brucei strain TREU927/4 was used for all analyses. All procedures involving vertebrate animals followed protocols approved by the institutional IACUC. For isolation of slender BF and stumpy BF, Wistar rats (retired-male breeders) were immunosuppressed with 750 rads. Animals were immediately infected with 1-2 × 10 8 parasites by intraperitoneal injection and the parasitemia was monitored daily beginning on day 2 postinfection. For slender BF populations, animals were sacrificed for harvest on day 4, when the parasitemia was still increasing. For stumpy BF populations, animals were sacrificed when the morphologically stumpy population in the blood smear exceeded 70%. Blood from infected animals was harvested with 10 mg/ml heparin in phosphate-buffered saline (PBS) with 10 mM glucose. The samples were then centrifuged at 900 × g for 10 min and the buffy coat containing the parasites extracted. To remove the remaining red blood cells, the buffy coat was loaded onto to a DE53 (Schleicher and Schuell) column that had been equilibrated with room temperature PBS plus glucose, hypoxanthine (0.35 g/ml) and L-cysteine (80 g/ml). Parasites were eluted from the column with the same buffer and pelleted at 900 × g for 10 min. RNA was immediately isolated as described below.
Stumpy BF begin to transform to PF quickly and synchronously, as determined by expression of EP procyclin on their surface four hours after induction, while neither slender or intermediate BF will express EP procyclin within this time frame [5,48]. To assess EP procyclin expression, ~5 × 10 7 purified parasites were pelleted and resuspended in DTM medium [49] with 6 mM freshly prepared cis-aconitate and incubated for 4 hours at 25°C. Washed cells were fixed in PBS with 3.7% formaldehyde for 10 min at room temperature. Fixed cells were washed once with PBS and stored in 100 l PBS at 4°C for up to 24 hours. Cells were spotted on poly-L-lysine coated slides and, after blocking with 10% goat serum, surfacelocalized procyclin detected with monoclonal antibodies with a mixture of two monoclonal antibodies against EP procyclin (antibody 16, directed against the glutamineproline repeats of EP procyclin and antibody 418, also directed against EP procyclin) [50] at a 1:500 dilution for 1 hour. Slides were processed and cells visualized as described [51]. PF strain 29.13 and cBF single marker strain were similarly stained as positive and negative controls respectively. For all counts, at least one hundred cells were scored.
HMI-9 medium [52] with 10% fetal bovine serum was used for continuous culture of cBF. The three biological replicates were prepared several months apart. Parasites were seeded into 400 ml of media and grown to ~8 × 10 5 cells/ml, to obtain a log-phase population. Cells were pelleted and RNA isolated as described below.

Generation of PF from stumpy BF
Stumpy BF obtained from an infected rat by cardiac puncture were diluted in SDM-79 medium [53] containing 15% fetal bovine serum. The sample was centrifuged at 200 × g for 10 min to pellet the majority of the blood cells, while leaving the majority of parasites in the supernatant. The supernatant was then transferred to a flask and cisaconitate was added to 6 mM to induce differentiation to PF at 26°C. After 2 hours, most of the remaining blood cells had settled to the bottom and the upper phase of the medium was transferred to a fresh flask and allowed to incubate overnight. The next day, the cells were again transferred to a fresh flask and the cultures were moni-tored daily until parasites started growing. The growing PF expressed EP procyclin in the immunostaining assay described above. For PF-log samples, cultures were grown to 4 × 10 6 cells/ml, split 1:1, and allowed to grow an additional 24 hours before harvest at densities of 6-7 × 10 6 cells/ml (see Additional File 1). For PF-stat samples, cultures were started at 10 6 cells/ml and cell number and integrity was monitored daily beginning day 3. RNA was isolated on day 5, the first day without any increase in cell density (at about 7.5 × 10 7 cells/ml). The next day cultures showed a significant increase in the proportion of dead cells.

Isolation of RNA
Cell pellets were resuspended in 1-2 ml TRIzol (Invitrogen), with a maximum of 5 × 10 8 cells per ml TRIzol. RNA was then isolated according to the manufacturer's directions. The quality of the RNA was verified by running on an Agilent 2100 Bioanalyzer. RNA was sent to Nimblegen for cDNA synthesis using oligo-dT priming, labeling and hybridization to oligonucleotide arrays. Because of the method of priming, the expression levels of RNAs, mRNAs and mitochondrially-encoded transcripts cannot be directly compared. However, both edited and nonedited mitochondrial transcripts of T. brucei have poly(A) tails [54], although the tails are longer for edited transcripts [55]. Because transcripts from genes that do not encode proteins (e.g. snRNAs, tRNAs and snoRNAs) are usually not polyadenylated, signals corresponding to most of these genes were low, and probably resulted from limited priming from oligoA tracts.

Array design
The nucleotide sequence of 8530 protein-coding sequences (CDS) and 559 RNA genes predicted from the T. brucei 927 genome assembly (version 4) was obtained from the GeneDB ftp site (genes annotated as "hypothetical unlikely", pseudogenes, or residing on intermediatesized chromosomes were excluded) and provided to Nimblegen for array design, aiming for eight probes per CDS and three probes per RNA (probes were 60 bp). Since the strain 927 telomeres were not sequenced, we also included 50 CDSs predicted from three BACs from strain 427 telomeric expression sites. In additional, the two RNA genes and 18 CDSs from maxicircle (mitochondrial) DNA [GenBank:M94286] and the corresponding edited transcripts were included in the array design.
To identify probes that would likely hybridize with more than one gene, we utilized CROSS_MATCH [56]. All probes to the CDS and RNA dataset above, as well as the entire T. brucei 927 genome sequence, were tested using a min_match of 35 and min_score of 55. Gene-specific probes were defined as those that had no more than one high-quality match (i.e. 35 contiguous perfectly matched nucleotides with no more than 5 mismatches per probe) against the entire genome. All probes that were not specific to a single gene (or set of near-identical genes) were removed from subsequent analyses. This resulted in 8004 probe-sets corresponding to nuclear protein-coding genes, 106 corresponding to nuclear RNA genes, 14 corresponding to maxicircle protein-coding genes and 2 corresponding to maxicircle rRNA genes (see Additional file 2). Of these, 345 probe-sets detected more than one identical (or near-identical) protein-coding gene; a few of these probe-sets detected a pseudogene or "hypothetical, unlikely" gene in addition to the original gene of interest. This was most common for VSGs. For simplicity, in this communication we refer to all genes detected by a probeset as "a gene". In the final analyses, 93% of the CDS probe-sets contained eight probes, while 54% of the RNA genes were represented by three probes. Each 60 bp probe was randomly assigned to three different spots on the array.
Only 409 T. brucei genes could not be analyzed (see Additional file 9); these included mosaic genes, as well as those for which probes could not be designed or for which all probes matched other (mostly unannotated) regions of the genome. Most corresponded to RNA genes, leaving only 124 protein-coding genes without a corresponding probe-set.

Analysis of microarray data
The probe intensity signals from the 15 microarray chips (three biological replicates by five conditions), as provided by Nimblegen, were subjected to quantile normalization to account for non-biological signal intensity variation across the chips [57]. This data has been deposited with the Gene Expression Omnibus (GEO) database (Accession no.: GSE18049), and has also been provided to TriTrypDB http://www.tritrypdb.org for public access. A single gene-level value for each biological replicate was determined by calculating a weighted mean of all probes in each probe-set using the Tukey biweight formula, which minimizes the influence of outlier values. The statistical significance of signal changes between samples was assessed by either pair-wise or multiclass tests from the Significance Analysis of Microarrays (SAM) software package [58], as appropriate, using all three biological replicates for each condition. We set the q-value to 5, yielding a false discovery rate of 5% for the set of genes identified. There was good agreement between genes showing a 2-fold or greater change assessed by the SAMcalculated mean and those identified using a Tukey biweight mean of the biological replicates. Pair-wise analyses considered both values, accepting all genes with at least a 2-fold change by one method and at least a 1.7-fold change by the other method.
Several different cluster analyses were performed using the MeV component of the TM4 software package [59]. The three biological replicates were combined by Tukey biweight function to give a single gene-level value for each of the five conditions and hierarchical clustering [60] was carried out on all 8110 probe-sets using the following parameters: Gene Tree selection only; Gene Leaf Order optimization; Euclidean Distance metric and complete linkage clustering. In order to focus on genes showing the most variation between biological conditions, we selected those in which at least one condition showed >2-fold deviation from the mean of all five biological conditions and q-value of < 5% in multi-class SAM analysis, and excluded all VSG and VSG-related (VR) genes and ESAG/ GRESAGs. The log 2 -transformed fold-values from the resultant 534 probe-sets were K-median clustered [61] by genes only using the Euclidian distance and 50 iterations, with hierarchical clustering of genes within the nine clusters using the parameters described above. The 357 probesets representing VSG/VR genes and ESAG/GRESAGs were separately analyzed by hierarchical clustering of values (log 2 -transformed Tukey biweight means) for the individual biological samples for all five different conditions, using the same parameters as above. The HCL tree was split into six clusters and two of these clusters further divided two sub-clusters. The clusters and sub-clusters were then combined into four different groups based on similarities in expression pattern.
All protein-coding genes were manually assigned (based on their GeneDB product description and literature review) into 17 functional categories that are particularly relevant to trypanosomatid parasite biology. These included: -DNA-associated (histone-related or histone modifying, proteins involved in DNA replication, repair, or chromatin remodeling, proteins with DNA binding motifs, and nucleases) -ESAGs/GRESAGs -Interacting (proteins with interaction motifs such as zinc fingers, leucine rich repeats, PX domains, PH domains, and WD40 domains but with no other attributed function) -Metabolism (metabolic enzymes) -Oxidative stress (peroxidases, peroxiredoxins, superoxide dismutases, trypanothione metabolism) -Phosphorylation (protein and lipid kinases and phosphatases) -Protease-related (peptidases, proteases, protease-related, and ubiquitin pathway) -Protein folding (chaperones, proteins involved in protein folding or unfolding) -Protein transport/modification (proteins mediating protein trafficking within the cell including the endomembrane system and organelles or involved in modification of proteins within the endomembrane system such as glycosylation) -RNA-associated (RNA binding proteins, helicases, nucleases) -Transcription (RNA polymerase subunits and transcription factors) -Translation (proteins involved in ribosome biogenesis or translation including nucleolar proteins, ribosomal proteins, tRNA synthetases and tRNA modifying enzymes) -Transporter (membrane proteins transporting small molecules) -VSG/VR -Other (any protein with functional annotation other than those above, plus those with known subcellular location discussed below) -Unknown, conserved (those annotated in GeneDB as "hypothetical protein, conserved and not assigned to any of the categories above) -Unknown, T. brucei-specific (those annotated in GeneDB as "hypothetical protein" and not assigned to any of the categories above) A subset of proteins (~900) were assigned to subcellular location categories on the basis of GeneDB annotation, literature surveys, or proteomic analyses. Assignment to the cytoskeleton included those identified in the flagellar proteome [13], while those with mitochondrial location included those identified in the high-confidence mitochondrial proteome [14]. Other assignments were less exhaustive and focused on differentially expressed genes. Surface/secreted proteins included those known to be secreted or localized to the parasite surface based on experimental analysis (ESAGs, GRESAGs, transporters, and VSGs were not included). Both functional categories and location were used to group genes for different analyses, as indicated in the Figure Legends. Additional file 2 provides gene-level information on gene categories, clus-