Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva
© Guo and Silva; licensee BioMed Central Ltd. 2008
Received: 04 July 2008
Accepted: 03 December 2008
Published: 03 December 2008
Parasites in the genus Theileria cause lymphoproliferative diseases in cattle, resulting in enormous socio-economic losses. The availability of the genome sequences and annotation for T. parva and T. annulata has facilitated the study of parasite biology and their relationship with host cell transformation and tropism. However, the mechanism of transcriptional regulation in this genus, which may be key to understanding fundamental aspects of its parasitology, remains poorly understood. In this study, we analyze the evolution of non-coding sequences in the Theileria genome and identify conserved sequence elements that may be involved in gene regulation of these parasitic species.
Intergenic regions and introns in Theileria are short, and their length distributions are considerably right-skewed. Intergenic regions flanked by genes in 5'-5' orientation tend to be longer and slightly more AT-rich than those flanked by two stop codons; intergenic regions flanked by genes in 3'-5' orientation have intermediate values of length and AT composition. Intron position is negatively correlated with intron length, and positively correlated with GC content. Using stringent criteria, we identified a set of high-quality orthologous non-coding sequences between T. parva and T. annulata, and determined the distribution of selective constraints across regions, which are shown to be higher close to translation start sites. A positive correlation between constraint and length in both intergenic regions and introns suggests a tight control over length expansion of non-coding regions. Genome-wide searches for functional elements revealed several conserved motifs in intergenic regions of Theileria genomes. Two such motifs are preferentially located within the first 60 base pairs upstream of transcription start sites in T. parva, are preferentially associated with specific protein functional categories, and have significant similarity to know regulatory motifs in other species. These results suggest that these two motifs are likely to represent transcription factor binding sites in Theileria.
Theileria genomes are highly compact, with selection seemingly favoring short introns and intergenic regions. Three over-represented sequence motifs were independently identified in intergenic regions of both Theileria species, and the evidence suggests that at least two of them play a role in transcriptional control in T. parva. These are prime candidates for experimental validation of transcription factor binding sites in this single-celled eukaryotic parasite. Sequences similar to two of these Theileria motifs are conserved in Plasmodium hinting at the possibility of common regulatory machinery across the phylum Apicomplexa.
Species in the genus Theileria are the causative agents of lymphoproliferative diseases in cattle. The high mortality and morbidity in cattle associated with these diseases cause devastating socio-economic losses . The species that cause the largest economic burden are T. parva, which causes East Coast fever, and T. annulata, which causes tropical theileriosis. Both species are transmitted by ticks and have complex intracellular life-cycles . Two intracellular stages follow the introduction of sporozoite stage into mammalian hosts, the intra-lymphocytic schizont stage and the intra-erythrocytic piroplasm stage. The schizont stage has a unique ability to cause infected host cells to proliferate indefinitely, resulting in a leukemia-like phenotype . Stage differentiation takes place at the time of parasite transmission to and from hosts, as well as during their establishment and amplification within the host. It has been shown that the molecular regulation of stage differentiation could be targeted for the development of new disease control strategies . One such target are the regulators of gene expression, since they may determine the progression through life-cycle stages. However, little is known about the extent of, and the elements involved in, transcriptional regulation in these protozoan parasites.
Theileria species are among several thousand unicellular eukaryotic parasites in the phylum Apicomplexa, whose members include Plasmodium falciparum, the most deadly causative agent of malaria, and Toxoplasma gondii, one of the most successful parasites, infecting more than 30% of the human population and imposing a severe threat to immuno-compromised individuals. Apicomplexans are thought to have reduced transcriptional machinery relative to multicellular eukaryotes. The highly biased nucleotide composition of the apicomplexan genomes sequenced to date hamper the detection of bona fide regulatory elements , thus making it difficult to assess the extent of that reduction. Classical eukaryotic promoter elements such as the TATA-box and the CAAT-box appear to be absent in T. gondii and P. falciparum . General transcription factors such as TFIIA and most of the TATA-binding protein associated factors are not found in apicomplexan genomes . Instead, non-canonical regulatory motifs are correlated with gene expression in T. gondii . However, recent studies that take into account the genome composition and/or the timing of gene expression are uncovering novel conserved sequence motifs, leading to a considerable expansion of the repertoire of known and putative transcription regulators in this phylum [9–14]. In T. parva, a massively parallel signature sequencing (MPSS) study indicates that the majority of genes are transcriptionally active in the schizont stage, but their transcription may not be stringently regulated and, instead, vary stochastically between different host cells or time points. Active regulation of transcription was observed only at certain classes of loci . In T. annulata, mobility shift assays have identified an element upstream of the TamS1 gene that is a site of DNA-protein interactions during differentiation to the merozoite .
The complex life-cycle of Theileria poses a serious challenge to the development of a workable laboratory system, thus making it difficult to study the regulatory mechanism and developmental biology of this organism. Recently, the genome sequences of T. parva and T. annulata were published [17, 18], enabling the study of the unique biological characteristics of Theileria parasites using systematic and global bioinformatics approaches. The genomes of T. parva and T. annulata are ~8 Mb long, with approximately 4000 genes each, identified through both automated and manual annotation. The gene density in these two genomes is fairly high, with the annotated coding sequences comprising ~70% of the nuclear DNA, introns ~10% and the remaining 20% consisting of intergenic regions (IGRs) [17, 18]. The short length of Theileria IGRs, which average ~400 base pairs (bp), again suggests a different mechanism of transcriptional regulation from those found in multi-cellular eukaryotes. More importantly, these compact IGRs allow us to assess the feasibility of finding regulatory sequences with an exhaustive scan of the non-coding segment of the genome, and in the absence of comprehensive gene expression data.
In the current work, we try to understand the evolutionary forces that determine the characteristic of non-coding regions in Theileria, such as length, nucleotide composition and degree of conservation. Selective constraint is calculated based on alignment of orthologous IGRs and introns of T. parva and T. annulata. In particular, this study aimed to test whether sequence conservation in general, and conserved sequence motifs in particular, are most common in IGRs flanked by genes in head-to-head orientation, as expected if transcriptional regulation is an important component in the regulation of gene expression in Theileria. We apply a de novo motif discovery algorithm to identify putative cis-regulatory elements in IGRs and compare their conservation in two species. Candidate motifs are then characterized based on their location relative to transcription start sites and the function of neighboring genes in T. parva, their similarity to known transcription factor binding sites, and their distribution pattern in different non-coding genomic regions. In this first comprehensive study of non-coding sequence evolution and motif discovery in Theileria, we have demonstrated that selection favors short introns and IGRs, and identified conserved sequence motifs whose role in gene regulation can now be tested by experimental approaches.
Characterization of non-coding DNA in T. parva and T. annulata
Property of orthologous intergenic regions (IGRs) and introns
T. parva IGRs
356.6 ± 9.8
23.3 ± 0.17
35.6 ± 0.59
T. annulata IGRs
336.2 ± 9.1
21.9 ± 0.19
36.8 ± 0.57
T. parva Introns
75.6 ± 1.83
21.9 ± 0.16
28.4 ± 0.52
T. annulata Introns
63.8 ± 1.21
21.1 ± 0.16
30.3 ± 0.52
The degree of selective constraint was estimated for each high-quality, orthologous intron pair (n = 1487) and IGR pair (n = 990), based on individual global alignments. Our results show the degree of conservation to be higher in IGRs than in introns (Table 1). Interestingly, selective constraint in IGRs of T. parva increases with the number of start codons that flank it (Figure 2C), and decreases with intron ordinal number (Figure 3C). The difference is statistically significant among different classes of IGRs and among introns (p < 0.001, Kruskal-Wallis test). These results point to the accumulation of functional motifs upstream of genes and also in introns that are closer to the 5' end of genes, and are consistent with the assertion of a direct relationship between the length of non-coding regions and the frequency of functional motifs.
Conserved Sequence Motifs and their Biological Relevance
Top five motifs in 5' intergenic regions of T. parva and T. annulata
T. parva Motif 1
T. annulata Motif 1
Next, we investigated the biological relevance for each motif in the context of the function of adjacent genes. A hypergeometric test demonstrated that certain functional categories are significantly enriched among genes downstream of motifs 1 and 3 (p < 0.01), but not of motif 2. Motif 1 is associated with genes involved in protein synthesis, with telomeric ORFs, and with proteins containing signal peptides, while motif 3 is associated with genes related to protein fate (Additional file 1). Using STAMP , we identified known motifs that are most similar to each of the three Theileria motifs. Motif 1 has similarity to a DNA consensus binding site for myeloid zinc finger protein 1 (MZF1), a C2H2 zinc finger transcription factor involved in granulopoiesis, cellular proliferation and oncogenesis  (Figure 4). In addition, motif 1 is identical to the motif ATGGGGC, which has been identified independently in different studies in Plasmodium [11, 14, 25] and which may be preferentially associated with metabolic genes that are highly expressed during the trophozoite stage . For motif 2, STAMP detected a highly similar sequence element that is known to interact with a nuclear protein in a plant species  (Figure 4). It is also nearly identical with a P. falciparum conserved motif TGTGT(G/A)(A/T) which, much like motif 2, has a widespread genome distribution . Motif 3 has significant similarity to the binding site for NF-κB, a family of transcription factors whose activation has been shown to be associated with host invasion in various pathogens, including T. parva  (Figure 4).
The fraction of non-coding DNA in the majority of bacterial and archaeal genomes is between 6% and 14%, but close to 90% in multicellular eukaryotes . Single-celled eukaryotes have a higher proportion of non-coding DNA than prokaryotes, but a much more compact genome than multicellular eukaryotes. The smallest known nuclear genome, that of a chlorarachniophyte nucleomorph, contains 22% intergenic DNA , and the single-celled model organism Saccharomyces cerevisiae, the budding yeast, contains 30% intergenic DNA . Intron length and number varies among taxa, and dramatic difference can be seen across related species. Approximately 40% of genes contain introns in the fission yeast, Schizosaccharomyces pombe, while only 4% of genes have introns in S. cerevisiae . The non-protein coding regions of multi-cellular eukaryotes include the remnants of transposable elements that have lost functionality, ribosomal genes, motifs involved in gene regulation and chromosomal structure and possibly other unknown functions. Only a few eukaryotic unicellular parasites have so far been shown to contain transposable elements [31–34], and many of these organisms are known to lack transposable elements all together [17, 35, 36], which probably explains much of the observed difference in the amount of non-coding DNA between these organisms and other eukaryotes. However, non-coding regions of these small parasitic genomes remain remarkably understudied, and little is known about the forces that shape them.
The genome of Theileria species is highly compact. Non-coding sequences make up ~30% of the genome, as the average length of both introns and IGRs is smaller than that of many eukaryote genomes, and no transposable elements have so far been found. Our results show that a large fraction of non-coding DNA is kept constant due to purifying selection (Table 1). This high conservation rate confirms the functional importance of non-coding sequences in Theileria, which goes beyond a role as passive intergenic spacers. This assertion is further supported by the higher degree of sequence conservation in IGRs that border the 5' end of genes relative to what is observed in IGRs flanked by termination codons, since IGR sequence conservation between species in regions upstream of genes is associated with the presence of regulatory elements . The presence of functionally important motifs at higher frequencies in 5'-5' IGRs relative to 3'-3' IGRs limits the fixation of deletion events in the former regions, which in turn remain longer.
Introns play an important role in gene transcription regulation and mRNA processing, and functional elements are often found in first introns [38, 39]. First introns tend to be longer and more conserved than introns of higher ordinal number, a pattern observed both in mammals  and in invertebrates . Our analyses yielded a similar result in Theileria, extending this pattern to unicellular eukaryotes for the first time. These results also suggest the accumulation of functional elements towards the 5' end of Theileria genes.
The study of transcriptional regulation in apicomplexan parasites has identified some unique features of this large group of protists. Canonical elements seem to be absent from promoter regions, while non-canonical regulatory elements in upstream regions have been found to be involved in the regulation of gene expression. Transcriptome analysis has demonstrated active regulation of transcription in T. parva , but no regulatory elements or transcription factors have been identified so far.
In this study, we found three putative motifs that are present in hundreds of copies throughout the genome. Two of them, motifs 1 and 3 in T. parva, are preferentially located in the 60 nucleotides upstream of TSS, suggesting that they may be transcription factor binding sites in this species. Motif 1 appears to be enriched near telomere-associated ORFs and signal peptide-containing proteins. While the function of telomeric ORFs in Theileria, so called due to their extreme proximity to telomeres, remains unknown they encode hyper-polymorphic gene families . In Plasmodium, gene families with these characteristics are known to be important to pathogenesis and antigenic variation [42, 43]. In fact, a large fraction of T. parva's telomeric ORFs expressed in the schizont stage contain predicted signal sequences, consistent with their involvement in host-parasite interaction . The current discovery of a putative regulatory element preferentially located in upstream regions of telomeric ORFs may help the functional study and design of molecular tools to manipulate this important group of proteins in Theileria. Based on a MIPS classification of T. parva proteins , we detected an association of this motif with genes involved in protein synthesis, and database searches indicated that it is similar to a DNA consensus binding site for myeloid zinc finger protein 1 (MZF1), a C2H2 zinc finger transcription factor involved in cellular proliferation and oncogenesis . The finding of an identical motif in Plasmodium associated with genes that are highly expressed during phases of rapid cellular multiplication makes it tempting to speculate whether motif 1 may be the binding site for an unidentified transcription factor in Theileria associated with protein synthesis and/or cell division.
Motif 3 appears to be associated with genes that are involved in protein modification, stabilization, degradation, targeting, sorting, translocation, and other protein fate-related functions. It is similar to the binding site for NF-κB transcription factors, which has been shown to be involved in host cell transformation mechanisms in various pathogens, including Theileria . The schizont stage of T. parva induces sustained activation of NF-κB, which regulates the expression of genes involved in immune and inflammatory responses , proliferation , and survival . Pathogens may also utilize the NF-κB system to enhance their own replication, survival, and dissemination within the host . For example, NF-κB binding sites have been found in the enhancer region of the long terminal repeat  and 5' UTR  of human immunodeficiency virus (HIV), through which host NF-κB promotes viral replication and survival. Although no homolog to human NF-κB has been identified in Theileria, it is conceivable that a transcription factor with a similar binding motif to that used by NF-κB might exist in these species. To that effect we searched the T. parva protein-coding genes for peptides with homology to the IPT domain of NF-κB, which is involved in DNA binding. Our BLASTP analysis uncovered a significant match (E = 3.7e-5) to a hypothetical protein, TP02-0125, which has 33% identity and 55% similarity to the DNA-binding domain of NF-κB. A support vector machine-based algorithm predicted TP02-0125 to be a DNA-binding protein . This result, together with the distribution characteristics of motif 3, suggests that attempting the experimental validation of this motif is warranted.
In contrast with the previous two motifs, motif 2 is found throughout non-coding regions, without a localized distribution relative to TSS or specific protein functional classes. However, an almost identical motif has been demonstrated to interact with a nuclear protein in developing rape seeds , suggesting the role of our motif as a binding site for regulatory proteins other than transcription factors. Interestingly, our motif is highly similar to a conserved motif in P. falciparum that also has a widespread genome distribution [14, 51]. While it is possible that this sequence pattern is a characteristic feature of apicomplexan genomes, its function remains unknown.
As more Theileria genome sequences become available, the search for conserved motifs in non-coding sequences will have added power. A comprehensive list of conserved elements may be derived by combining phylogenetic footprinting and de novo pattern matching algorithms. Various experimental approaches are available for the verification of putative regulatory elements, including in vitro protein binding experiments, in vivo DNA structure assays, and reverse genetics methods. In particular electrophoretic mobility shift essays have been documented to work well in apicomplexan systems [9, 11, 16]. Revealing the functional potential of these conserved elements will advance the study of gene regulation in Theileria and possibly lead to the improved control and therapeutics for East Coast Fever and tropical therileriosis in cattle.
The highly compact genome of Theileria seems to result from selection pressure for small introns and IGRs. While, much like in other apicomplexan genomes, classical eukaryotic promoter elements have not been found in Theileria, genome-wide de novo searches identified several conserved sequence motifs in IGRs. Two putative T. parva motifs have localized distribution relative to transcriptional start sites and are preferentially associated with specific protein functions, which is consistent with the hypothesis that they participate in transcriptional regulation in this eukaryotic parasite. The fact that conserved motifs with similar sequence are found in Plasmodium hints at the possibility of common regulatory mechanisms across the phylum Apicomplexa.
An in-house database was developed to store and analyze genome sequences and annotations of two Theileria species. There are 4011 annotated genes in T. parva, with 10408 introns. 5' and 3' UTR information is available for 650 and 545 genes, respectively. The total number of IGRs, defined as complete DNA sequences between start and/or stop codons of two consecutive annotated genes, is 3982 in T. parva. T. annulata has 3784 annotated genes, with 10816 introns and a total of 3738 IGRs. Transfer and ribosomal RNA genes were also excluded from IGRs.
Orthologous clusters (OCs) were created using the publicly available gene annotations of six apicomplexan genomes, namely T. parva, T. annulata, Babesia bovis, Plasmodium falciparum (version released with PlasmoDB 5.0 Beta), Plasmodium yoelii, and Cryptosporidium parvum. Except for P. falciparum, the original genome annotation release was used for each species. Jaccard-filtered OC analysis  was used to construct the final ortholog set, and resulted in 3137 OCs containing at least one gene from each of the two Theileria species (Jaccard filter cutoff at 0.6). OCs with paralogs in Theileria were excluded, and 2904 OCs were retained that include exactly one gene from each species. OCs were also excluded if the ratio of the shortest to the longest of the two Theileria genes was smaller than 0.9. Genes eliminated by this criterion are likely to have an incorrect structural gene annotation in one or both species, or to contain repeats that render sequence alignment questionable. The final set of high-quality OCs contains 1956 genes, with one sequence each in T. parva and in T. annulata.
A high-quality (HQ) set of orthologous non-coding regions was then defined as follows: HQ orthologous IGRs are flanked by HQ OCs at both ends. HQ orthologous introns are those from HQ OCs with the same number of introns in both species; in addition, HQ orthologous introns are flanked by exon regions in which amino acid similarity is ≥ 75% between species. This cutoff was determined empirically and resulted in the elimination of 25% of introns. The final dataset contains 990 pairs of HQ orthologous IGRs and 1487 pairs of HQ orthologous introns.
Orthologous non-coding sequences were aligned using Owen , to generate global alignments consisting of segments with significant sequence similarity (p < 0.001), also called hits, interspersed with segments for which sequence similarity was below that significance threshold. Selective constraint, c, in IGRs and introns was estimated according to Shabalina and Kondrashov . It is defined as the fraction of invariant nucleotides within a sequence segment. Given a similarity (s) within a hit defined as the number of matches divided by the length of the shorter sequence (l short ), and the probability r that a site is identical between two sequences due to non-deterministic reasons, selective constraint within a hit is estimated as for shorter sequence and for longer sequence. In this study, we estimated r from the similarity in third codon positions of 4-fold degenerate amino acids in HQ OCs between the two Therileria species; these should correspond mostly to sites that are identical due to chance or because not enough time has elapsed since the two species split for substitutions to occur. For an IGR or intron having n hits, the selective constraint is defined as the sum, over all n hits, of the total number of constrained nucleotides divided by the length L of non-coding sequence in either species, .
Although computational motif discovery approaches have traditionally been used to find over-represented patterns among co-regulated genes, they may also be applied to large sets of unrelated promoter regions . In the current study, we retrieved up to 300 bases of IGRs flanking the translation start sites of all T. parva and T. annulata genes and used MEME (Multiple EM for Motif Elicitation) to search for over-represented motifs on both DNA strands. The search was done separately for each species. MEME looks for conserved ungapped blocks in a group of sequences using an iterative expectation-maximization algorithm . A background model is used by MEME to calculate the log likelihood ratio and statistical significance of a motif. The model used in our search is a first-order Markov chain derived from all intergenic sequences in T. parva and T. annulata respectively. We assume the presence of either zero or one motif per sequence, with motif width between 5–15 bp. Putative motifs found in two species were compared by the Smith-Waterman local alignment method with similarity score defined by Pearson's correlation coefficient .
Biological Relevance of Putative Motifs
Putative motifs were further investigated in terms of their location distribution in relation to TSS, functional characterization of downstream genes, similarity to known transcription factor binding sites, and occurrence pattern in different non-coding genomic regions. These four analyses were performed in T. parva only. First, we retrieved all T. parva genes with putative TSSs information available and looked at the intersection between these genes and our set of putative motifs. Distance between the first base of a motif and TSS was calculated and binned in 10-bp intervals. Localized distribution of motifs in relation to TSS will provide evidence for the biological relevance of putative motifs.
As multiple functional catagories were tested simultaneously, p-values are corrected for multiple hypotheses testing using a Bonferroni approach. The calculation was done using the function phyper in the statistical language R.
In the third analysis, we identified known motifs with significant similarity to our putative motifs using the web tool STAMP . T. parva motifs were searched against a collection of databases including two comprehensive eukaryotic motif databases JASPAR  and TRANSFAC , drosophila DNase I footprint database FlyReg , plant motif databases PLACE  and AthaMap , prokaryotic motif databases RegTransBase  and DPInteract , as well as yeast motifs predicted by Harbison et al  and MacIsaac et al . Motif similarity was estimated by the Smith-Waterman local alignment method with similarity score defined by Pearson's correlation coefficient. The significance is estimated based on simulated position specific scoring matrix (PSSM) models .
Lastly, we used MAST (Motif Alignment and Search Tool) to find all occurrences of each putative motif in T. parva non-coding regions . The E-value of a match in an IGR or intron to a motif is based on a random sequence model derived from all non-coding sequences in T. parva. Various E-value cutoffs are used to retrieve matches with different statistical significance.
- Norval RAI, Perry BD, Young AS: The epidemiology of Theileriosis in Africa. 1992, London: Academic PressGoogle Scholar
- Dobbelaere DA, Kuenzi P: The strategies of the Theileria parasite: a new twist in host-pathogen interactions. Curr Opin Immunol. 2004, 16 (4): 524-530. 10.1016/j.coi.2004.05.009.PubMedView ArticleGoogle Scholar
- Brown CGD, Stagg DA, Purnell RE, Kanhai GK, Payne RC: Infection and transformation of bovine lymphoid cells in vitro by infective particles of Theileria parva. Nature. 1973, 245 (5420): 101-103. 10.1038/245101a0.PubMedView ArticleGoogle Scholar
- Shiels B, Swan D, McKellar S, Aslam N, Dando C, Fox M, Ben-Miled L, Kinnaird J: Directing differentiation in Theileria annulata: old methods and new posibilities for control of apicomplexan parasites. Int J Parasitol. 1998, 28 (11): 1659-1670. 10.1016/S0020-7519(98)00131-3.PubMedView ArticleGoogle Scholar
- van Noort V, Huynen MA: Combinatorial gene regulation in Plasmodium falciparum. Trends Genet. 2006, 22 (2): 73-78. 10.1016/j.tig.2005.12.002.PubMedView ArticleGoogle Scholar
- Militello KT, Dodge M, Bethke L, Wirth DF: Identification of regulatory elements in the Plasmodium falciparum genome. Mol Biochem Parasitol. 2005, 134 (1): 75-88. 10.1016/j.molbiopara.2003.11.004.View ArticleGoogle Scholar
- Meissner M, Soldati D: The transcription machinery and the molecular toolbox to control gene expression in Toxoplasma gondii and other protozoan parasites. Microbes Infect. 2005, 7 (13): 376-1384. 10.1016/j.micinf.2005.04.019.View ArticleGoogle Scholar
- Kibe MK, Coppin A, Dendouga N, Oria G, Meurice E, Mortuaire M, Madec E, Tomavo S: Transcriptional regulation of two stage-specifically expressed genes in the protozoan parasite Toxoplasma gondii. Nucleic Acid Res. 2005, 33 (5): 1722-1736. 10.1093/nar/gki314.PubMedPubMed CentralView ArticleGoogle Scholar
- Behnke MS, Radke JB, Smith AT, Sullivan WJ, White MW: The transcription of bradyzoite genes in Toxoplasma gondii is controlled by autonomous promoter elements. Mol Microbiol. 2008, 68 (6): 1502-1518. 10.1111/j.1365-2958.2008.06249.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Hackney JA, Ehrenkaufer GM, Singh U: Identification of putative transcriptional regulatory networks in Entamoeba histolytica using Bayesian inference. Nucleic Acid Res. 2007, 35 (7): 2141-2152. 10.1093/nar/gkm028.PubMedPubMed CentralView ArticleGoogle Scholar
- Sunil S, Chauhan VS, Malhotra P: Distinct and stage specific nuclear factors regulate the expression of falcipains, Plasmodium falciparum cysteine proteases. BMC Mol Biol. 2008, 9: 47-10.1186/1471-2199-9-47.PubMedPubMed CentralView ArticleGoogle Scholar
- De Silva EK, Gehrke AR, Olszewski K, León I, Chahal JS, Bulyk ML, Llinás M: Specific DNA-binding by apicomplexan AP2 transcription factors. Proc Natl Acad Sci USA. 2008, 105 (24): 8393-8398. 10.1073/pnas.0801993105.PubMedPubMed CentralView ArticleGoogle Scholar
- Mullapudi N, Lancto CA, Abrahamsen MS, Kissinger JC: Identification of putative cis-regulatory elements in Cryptosporidium parvum by de novo pattern finding. BMC Genomics. 2007, 8: 13-10.1186/1471-2164-8-13.PubMedPubMed CentralView ArticleGoogle Scholar
- Young JA, Johnson JR, Benner C, Yan SF, Chen K, Le Roch KG, Zhou Y, Winzeler EA: In silico discovery of transcription regulatory elements in Plasmodium falciparum. BMC Genomics. 2008, 9: 70-10.1186/1471-2164-9-70.PubMedPubMed CentralView ArticleGoogle Scholar
- Bishop R, Shah T, Pelle R, Hoyle D, Pearson T, Haines L, Brass A, Hulme H, Graham SP, Taracha ELN, Kanga S, Lu C, Hass B, Wortman J, White O, Gardner MJ, Nene V, Villiers EP: Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage. Nucleic Acid Res. 2005, 33 (17): 5503-5511. 10.1093/nar/gki818.PubMedPubMed CentralView ArticleGoogle Scholar
- Shiels B, Fox M, Mckellar S, Kinnaird J, Swan D: An upstream element of the TamS1 gene is a site of DNA-protein interactions during differention to the merozoite in Therileria annulata. J Cell Sci. 2000, 113 (Pt 12): 2243-2252.PubMedGoogle Scholar
- Gardner M, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M, Wilson RJ, Sato S, Ralph SA, Mann DJ, Xiong Z, Shallom SJ, Weidman J, Jiang L, Lynn J, Weaver B, Shoaibi A, Domingo AR, Wasawo D, Crabtree J, Wortman JR, Haas B, Angiuoli SV, Creasy TH, Lu C, Suh B, Silva JC, Utterback TR, Feldblyum TV, Pertea M, Allen J, Nierman WC, Taracha EL, Salzberg SL, White OR, Fitzhugh HA, Morzaria S, Venter JC, Fraser CM, Nene V: Genome seqeuence of Theileria parva, a Bovine Pathogen that transforms lymphocytes. Science. 2005, 309 (5731): 134-137. 10.1126/science.1110439.PubMedView ArticleGoogle Scholar
- Pain A, Renauld H, Berriman M, Murphy L, Yeats CA, Weir W, Kerhornou A, Aslett M, Bishop R, Bouchier C, Cochet M, Coulson RM, Cronin A, de Villiers EP, Fraser A, Fosker N, Gardner M, Goble A, Griffiths-Jones S, Harris DE, Katzer F, Larke N, Lord A, Maser P, McKellar S, Mooney P, Morton F, Nene V, O'Neil S, Price C, Quail MA, Rabbinowitsch E, Rawlings ND, Rutter S, Saunders D, Seeger K, Shah T, Squares R, Squares S, Tivey A, Walker AR, Woodward J, Dobbelaere DA, Langsley G, Rajandream MA, McKeever D, Shiels B, Tait A, Barrell B, Hall N: Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science. 2005, 309 (5731): 131-133. 10.1126/science.1110418.PubMedView ArticleGoogle Scholar
- Lim LP, Burge CB: A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci USA. 2001, 98 (20): 11193-11198. 10.1073/pnas.201407298.PubMedPubMed CentralView ArticleGoogle Scholar
- Seoighe C, Gehring C, Hurst LD: Gametophytic selection in Arabidopsis thaliana supports the selective model of intron length reduction. PLoS Genet. 2006, 1 (2): e13-10.1371/journal.pgen.0010013.View ArticleGoogle Scholar
- Gaffney DJ, Keightley PD: Genomic selective constraints in murid noncoding DNA. PLoS Genet. 2006, 2 (11): e204-10.1371/journal.pgen.0020204.PubMedPubMed CentralView ArticleGoogle Scholar
- Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM: Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 2006, 16 (1): 1-10. 10.1101/gr.4222606.PubMedPubMed CentralView ArticleGoogle Scholar
- Mahony S, Benos PV: STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 2007, 35: W253-258. 10.1093/nar/gkm272.PubMedPubMed CentralView ArticleGoogle Scholar
- Morris JF, Hromas R, Rauscher FJ: Characterization of the DNA-binding properties of the myeloid zinc finger protein MZF1: two independent DNA-binding domains recognize two DNA consensus sequences with a common G-rich core. Mol Cell Biol. 1994, 14 (3): 1786-1795.PubMedPubMed CentralView ArticleGoogle Scholar
- Militello KT, Dodge M, Bethke L, Wirth DF: Identification of regulatory elements in the Plasmodium falciparum genome. Mol Biochem Parasitol. 2004, 134 (1): 75-88. 10.1016/j.molbiopara.2003.11.004.PubMedView ArticleGoogle Scholar
- Ericson ML, Muren E, Gustavsson HO, Josefsson LG, Rask L: Analysis of the promoter region of napin genes from Brassica napus demonstrates binding of nuclear protein in vitro to a conserved sequence motif. Eur J Biochem. 1991, 197 (3): 741-746. 10.1111/j.1432-1033.1991.tb15966.x.PubMedView ArticleGoogle Scholar
- Heussler VT, Rottenberg S, Schwab R, Kuenzi P, Fernandez PC, McKellar S, Shiels B, Chen ZJ, Orth K, Wallach D, Dobbelaere DA: Hijacking of host cell IKK signalosomes by the transforming parasite Theileria. Science. 2002, 298 (5595): 1033-1036. 10.1126/science.1075462.PubMedView ArticleGoogle Scholar
- Rogozin IB, Makarova KS, Natale DA, Spiridonov AN, Tatusov RL, Wolf YI, Yin J, Koonin EV: Congruent evolution of different classes of non-coding DNA in prokaryotic genomes. Nucleic Acid Res. 2002, 30 (19): 4264-4271. 10.1093/nar/gkf549.PubMedPubMed CentralView ArticleGoogle Scholar
- Gilson PR, Su V, Slamovits CH, Reith ME, Keeling PJ, McFadden GI: Complete nucleotide sequence of the chlorarachniophyte nucleomorph: Nature's smallest nucleus. Proc Natl Acad Sci USA. 2006, 103 (25): 9566-9571. 10.1073/pnas.0600707103.PubMedPubMed CentralView ArticleGoogle Scholar
- Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG: Life with 6000 Genes. Science. 1996, 274 (5287): 546-567. 10.1126/science.274.5287.546.PubMedView ArticleGoogle Scholar
- Bhattacharya S, Bakre A, Bhattacharya A: Mobile genetic elements in protozoan parasites. J Genet. 2002, 81 (2): 73-86. 10.1007/BF02715903.PubMedView ArticleGoogle Scholar
- Silva JC, Bastida F, Bidwell SL, Johnson PJ, Carlton JM: A potentially functional mariner transposable element in the protist Trichomonas vaginalis. Mol Biol Evol. 2005, 22 (1): 126-134. 10.1093/molbev/msh260.PubMedPubMed CentralView ArticleGoogle Scholar
- Pritham EJ, Putliwala T, Feschotte C: Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses. Gene. 2007, 390: 1-2. 10.1016/j.gene.2006.08.008.View ArticleGoogle Scholar
- Souza RT, Santos MR, Lima FM, El-Sayed NM, Myler PJ, Ruiz JC, da Silveira JF: New Trypanosoma cruzi repeated element that shows site specificity for insertion. Eukaryot Cell. 2007, 6 (7): 1228-38. 10.1128/EC.00036-07.PubMedPubMed CentralView ArticleGoogle Scholar
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, Paulsen IT, James K, Eisen JA, Rutherford K, Salzberg SL, Craig A, Kyes S, Chan MS, Nene V, Shallom SJ, Suh B, Peterson J, Angiuoli S, Pertea M, Allen J, Selengut J, Haft D, Mather MW, Vaidya AB, Martin DM, Fairlamb AH, Fraunholz MJ, Roos DS, Ralph SA, McFadden GI, Cummings LM, Subramanian GM, Mungall C, Venter JC, Carucci DJ, Hoffman SL, Newbold C, Davis RW, Fraser CM, Barrell B: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002, 419 (6906): 498-511. 10.1038/nature01097.PubMedView ArticleGoogle Scholar
- Carlton JM, Angiuoli SV, Suh BB, Kooij TW, Pertea M, Silva JC, Ermolaeva MD, Allen JE, Selengut JD, Koo HL, Peterson JD, Pop M, Kosack DS, Shumway MF, Bidwell SL, Shallom SJ, van Aken SE, Riedmuller SB, Feldblyum TV, Cho JK, Quackenbush J, Sedegah M, Shoaibi A, Cummings LM, Florens L, Yates JR, Raine JD, Sinden RE, Harris MA, Cunningham DA, Preiser PR, Bergman LW, Vaidya AB, van Lin LH, Janse CJ, Waters AP, Smith HO, White OR, Salzberg SL, Venter JC, Fraser CM, Hoffman SL, Gardner MJ, Carucci DJ: Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature. 2002, 419 (6906): 512-519. 10.1038/nature01099.PubMedView ArticleGoogle Scholar
- Castillo-Davis C, Hartl DL, Achaz G: Cis-regulatory and protein evolution in orthologous and duplicate genes. Genome Res. 14 (8): 1530-1536. 10.1101/gr.2662504.Google Scholar
- Rohrer J, Conley ME: Transcriptional regulatory elements within the first intron of Broton's tyrosine kinase. Blood. 1998, 91 (1): 214-221.PubMedGoogle Scholar
- Chan RY, Boudreau-Lariviere C, Angus LM, Mankal FA, Jasmin BJ: An intronic enhancer containing an N-box motif is required for synapse- and tissue-specific expression of the acetylcholinesterase gene in skeletal muscle fibers. Proc Natl Acad Sci USA. 1999, 96 (8): 4627-4632. 10.1073/pnas.96.8.4627.PubMedPubMed CentralView ArticleGoogle Scholar
- Marais G, Nouvellet P, Keightley PD, Charlesworth B: Intron size and exon evolution in Drosophila. Genetics. 2005, 170 (1): 481-485. 10.1534/genetics.104.037333.PubMedPubMed CentralView ArticleGoogle Scholar
- Bishop R, Gobright E, Nene V, Morzaria S, Musoke A, Sohanpal B: Polymorphic open reading frames encoding secretory proteins are located less than 3 kilobases from Theileria parva telomeres. Mol Biochem Parasitol. 2000, 110 (2): 359-371. 10.1016/S0166-6851(00)00291-7.PubMedView ArticleGoogle Scholar
- Mancio-Silva L, Rojas-Meza AP, Vargas M, Scherf A, Hernandez-Rivas R: Differential association of Orc1 and Sir2 proteins to telomeric domains in Plasmodium falciparum. J Cell Sci. 2008, 121 (Pt 12): 2046-2053. 10.1242/jcs.026427.PubMedView ArticleGoogle Scholar
- Freitas-Junior LH, Hernandez-Rivas R, Ralph SA, Montiel-Condado D, Ruvalcaba-Salazar OK, Rojas-Meza AP, Mâncio-Silva L, Leal-Silvestre RJ, Gontijo AM, Shorte S, Scherf A: Telomeric heterochromatin propagation and histone acetylation control mutually exclusive expression of antigenic variation genes in malaria parasites. Cell. 2005, 121 (1): 25-36. 10.1016/j.cell.2005.01.037.PubMedView ArticleGoogle Scholar
- Karin M, Delhase M: The IkB kinase (IKK) and NF-kB: key elements of proinflammatory signaling. Semin Immunol. 2000, 12 (1): 85-98. 10.1006/smim.2000.0210.PubMedView ArticleGoogle Scholar
- Joyce D, Albanese C, Steer J, Fu M, Bouzahzah B, Pestell RG: NF-κB and cell-cycle regulation: the cyclin connection. Cytokine Growth Factor Rev. 2001, 12 (1): 73-90. 10.1016/S1359-6101(00)00018-6.PubMedView ArticleGoogle Scholar
- Kucharczak J, Simmons MJ, Fan Y, Gelinas C: To be, or not to be: NF-kB is the answer-role of Rel/NF-kB in the regulation of apoptosis. Oncogene. 2003, 22 (56): 8961-8982. 10.1038/sj.onc.1207230.PubMedView ArticleGoogle Scholar
- Tato CM, Hunter CA: Host-pathogen interactions: subversion and utilization of the NF-κB pathway during infection. Infect Immun. 2002, 70 (7): 3311-3317. 10.1128/IAI.70.7.3311-3317.2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Roulston A, Lin R, Beauparlant P, Wainberg MA, Hiscott J: Regulation of human immunodε ficiency virus type 1 and cy tokine gene expression in myeloid cells by NF-κB/Rel transcription factors. Microbiol Rev. 1995, 59 (3): 481-505.PubMedPubMed CentralGoogle Scholar
- Al-Harthi Λ, Roebuck KA: Human immunode ficiency virus type-1 transcription: role of the 5'-untranslated leader regi on. Int J Mol Med. 1998, 1 (5): 875-881.PubMedGoogle Scholar
- Kumar M, Gromiha MM, Raghava GPS: Ide ntification of DNA-binding proteins using support vector machines and evolu tionary profiles. BMC Bioinformatics. 2007, 8: 463-10.1186/1471-2105-8-463.PubMedPubMed CentralView ArticleGoogle Scholar
- Imamura H, Persampieri JH, Chuang JH: Sequences conserved by selection across mouse and human malaria species. BMC Genomics. 2007, 8: 372-10.1186/1471-2164-8-372.PubMedPubMed CentralView ArticleGoogle Scholar
- El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, Aggarwal G, Caler E, Renauld H, Worthey EA, Hertz-Fowler C, Ghedin E, Peacock C, Bartholomeu DC, Haas BJ, Tran AN, Wortman JR, Alsmark UC, Angiuoli S, Anupama A, Badger J, Bringaud F, Cadag E, Carlton JM, Cerqueira GC, Creasy T, Delcher AL, Djikeng A, Embley TM, Hauser C, Ivens AC, Kummerfeld SK, Pereira-Leal JB, Nilsson D, Peterson J, Salzberg SL, Shallom J, Silva JC, Sundaram J, Westenberger S, White O, Melville SE, Donelson JE, Andersson B, Stuart KD, Hall N: Comparative genomics of Trypanosomatid parasitic protozoa. Science. 2005, 309 (5733): 404-409. 10.1126/science.1112181.PubMedView ArticleGoogle Scholar
- Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS: OWEN: aligning long collinear regions of genomes. Bioinformatics. 2002, 18 (12): 1703-1704. 10.1093/bioinformatics/18.12.1703.PubMedView ArticleGoogle Scholar
- Shabalina SA, Kondrashov AS: Pattern of selective constraint in C. elegans and C. brigsae genomes. Genet Res. 1999, 74 (1): 23-30. 10.1017/S0016672399003821.PubMedView ArticleGoogle Scholar
- Shabalina SA, Ogurtsov AY, Kondrashov VA, Kondrashov AS: Selective constraint in intergenic regions of human and mouse genomes. Trends Genet. 2001, 17 (7): 373-376. 10.1016/S0168-9525(01)02344-7.PubMedView ArticleGoogle Scholar
- Ohler U, Liao G, Niemann H, Rubin G: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 2002, 3 (12): 1-0087. 10.1186/gb-2002-3-12-research0087.View ArticleGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Second Int Conf Intelligent Systems for Mol Bio: August 1994; Menlo Park, CA. 1994, AAAI Press, 28-36.Google Scholar
- Sandelin A, Wesserman WW: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J Mol Biol. 2004, 338 (2): 207-215. 10.1016/j.jmb.2004.02.048.PubMedView ArticleGoogle Scholar
- Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Münch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003, 31 (1): 374-378. 10.1093/nar/gkg108.PubMedPubMed CentralView ArticleGoogle Scholar
- Bergman CM, Carlson JW, Celniker SE: Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics. 2005, 21 (8): 1747-1749. 10.1093/bioinformatics/bti173.PubMedView ArticleGoogle Scholar
- Higo K, Ugawa Y, Iwamoto M, Korenaga T: Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Res. 1999, 27 (1): 297-300. 10.1093/nar/27.1.297.PubMedPubMed CentralView ArticleGoogle Scholar
- Galuschka C, Schindler M, Bülow L, Hehl R: AthaMap web tools for the analysis and identification of co-regulated genes. Nucleic Acids Res. 2007, 35: D857-862. 10.1093/nar/gkl1006.PubMedPubMed CentralView ArticleGoogle Scholar
- Kazakov AE, Cipriano MJ, Novichkov PS, Minovitsky S, Vinogradov DV, Arkin A, Mironov AA, Gelfand MS, Dubchak I: RegTransBase–a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res. 2007, 35: D407-D412. 10.1093/nar/gkl865.PubMedPubMed CentralView ArticleGoogle Scholar
- Robison K, McGuire AM, Church GM: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol. 1998, 284 (2): 241-254. 10.1006/jmbi.1998.2160.PubMedView ArticleGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.PubMedPubMed CentralView ArticleGoogle Scholar
- MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics. 2006, 7: 113-10.1186/1471-2105-7-113.PubMedPubMed CentralView ArticleGoogle Scholar
- Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acid Res. 2004, 32: D91-D94. 10.1093/nar/gkh012.PubMedPubMed CentralView ArticleGoogle Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998, 14 (1): 48-54. 10.1093/bioinformatics/14.1.48.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.