The Babesia bovis gene and promoter model: an update from full-length EST analysis
© Yamagishi et al.; licensee BioMed Central Ltd. 2014
Received: 24 May 2013
Accepted: 8 July 2014
Published: 13 August 2014
Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reliable prior to experimental validation. Therefore, we validated a preproposed gene model of B. bovis and extended the associated annotations on the basis of experimentally obtained full-length expressed sequence tags (ESTs).
From in vitro cultured merozoites, 12,286 clones harboring full-length cDNAs were sequenced from both ends using the Sanger method, and 6,787 full-length cDNAs were assembled. These were then clustered, and a nonredundant referential data set of 2,115 full-length cDNA sequences was constructed. The comparison of the preproposed gene model with our data set identified 310 identical genes, 342 almost identical genes, 1,054 genes with potential structural inconsistencies, and 409 novel genes. The median length of 5' untranslated regions (UTRs) was 152 nt. Subsequently, we identified 4,086 transcription start sites (TSSs) and 2,023 transcriptionally active regions (TARs) by examining 5' ESTs. We identified ATGGGG and CCCCAT sites as consensus motifs in TARs that were distributed around -50 bp from TSSs. In addition, we found ACACA, TGTGT, and TATAT sites, which were distributed periodically around TSSs in cycles of approximately 150 bp. Moreover, related periodical distributions were not observed in mammalian promoter regions.
The observations in this study indicate the utility of integrated bioinformatics and experimental data for improving genome annotations. In particular, full-length cDNAs with one-base resolution for TSSs enabled the identification of consensus motifs in promoter sequences and demonstrated clear distributions of identified motifs. These observations allowed the illustration of a model promoter composition, which supports the differences in transcriptional regulation frameworks between apicomplexan parasites and mammals.
Bovine babesiosis is a parasitic infection caused by a protozoan of the genus Babesia, order Piroplasmida, phylum Apicomplexa. Babesia bovis and Babesia bigemina are major species that impose a considerable economic burden on cattle industries because of their wide geographical distribution and pathogenicity . The clinical symptoms of B. bovis are more serious than those of B. bigemina, including fever, extensive erythrocyte lysis leading to anemia, icterus, hemoglobinuria, and death. Although antiparasitic drugs such as imidocarb successfully control these symptoms , they have severe side effects and may promote the emergence of resistant strains and residual chemicals. Therefore, safer chemical agents and vaccinations are required.
In general, the genome is an excellent tool for understanding all life forms. Unique genes and pathways that are elucidated from genomes are often recognized as targets for chemical or vaccine development. Because the genome sequence of B. bovis is publically available , it may offer promising information for the development of novel approaches for controlling parasitic infections. According to a previous bioinformatics study, the B. bovis genome encodes 3,671 nuclear protein-coding genes. However, estimated gene models based on bioinformatics lack accuracy in nonmodel organisms. Inconsistencies in gene models have been reported between bioinformatics estimates and experimental observations of apicomplexan parasites [4, 5]. Therefore, to improve reliability, gene models require verification with experimental evidence.
The acquisition of mRNA sequences is one of the most straightforward strategies for verifying gene models. Specifically, full-length cDNA libraries facilitate the identification of transcription start sites (TSSs), exon and intron structures, 5′ and 3′ untranslated regions (UTRs), and polyadenylation sites. Moreover, massive sets of TSSs can be used to identify transcriptionally active regions (TARs), which are closely related to promoter regions [6, 7]. Therefore, the determination of full-length cDNA transcriptomes is critical for revisions of gene models, and for elucidation of transcriptional mechanisms.
In this study, we collected 5′ and 3′ expressed sequence tags (ESTs) from full-length cDNAs of B. bovis that were synthesized using the oligo-capping method . In brief, cap structures at 5′ ends of mRNA were replaced with synthetic linker RNA sequences using the oligo-capping method. Subsequently, chimeric RNA was used to synthesize cDNA with fixed 5′ transcript sequences. This cDNA was then sequenced, and the data was entered into an updated gene model to identify novel genes. In addition, consensus sequences around TSSs and putative DNA cis-elements for transcriptional control were identified by comparison with promoter regions identified in genome-wide analyses.
Results and discussion
Construction and analysis of full-length cDNA
Summary of ESTs and contigs
Total number of isolated clones
5' one-pass sequence
3' one-pass sequence
Non-redundant contig sequence1)
Amino acid variant3)
Assigned in this study5)
The 5′ UTRs that lie between TSSs and first in-frame initiation codons are known to play crucial roles in post-transcriptional regulation by modulating translational efficiency and mRNA stability through the actions of IRES and riboswitches [10, 11]. This mechanism is observed in a wide variety of organisms, including humans, plants, and yeast [12–14], suggesting that apicomplexan parasites have similar functions. However, these functions have been poorly investigated. Therefore, to elucidate the functions of the 5′ UTRs of B. bovis, we constructed a genome-wide 5′ UTR sequence data set using full-length cDNA sequences and demonstrated that the median length of the 5′ UTRs of B. bovis is 152 nts. The average 5′ UTRs are 210.2 nts in humans, 186.3 nts in rodents, 221.9 nts in invertebrates, 103.0 nts in viridiplantae, and 134.0 nts in fungi  and the mode length is approximately 130 nts in Toxoplasma gondii. These lengths agree with our observations in B. bovis. Similarly, the median length of the 3′ UTRs of B. bovis is 116 nts (Additional file 2: Table S2).
Gene expression frequencies are also indicated in EST data. Therefore, we examined the 9,573 5′ ESTs data set and selected 9,546 sequences following successful mapping onto the B. bovis genome. To estimate expression frequencies, these were then mapped onto preproposed CDSs with novel sequences identified in this study (Additional file 3: Table S3 and Additional file 4: Figure S1). The resulting ranking was not identical to that in a previous study of ESTs , although it showed similar tendencies. These discrepancies may reflect differences in culture conditions and parasite strains or sampling errors associated with small data sets. Logarithmic plots of expression levels and ranks of each gene resembled the power law (Additional file 4: Figure S1) and indicated similar transcriptome distributions to those observed in previous studies [18, 19].
B. bovispromoter components and typical structure
Transcription is controlled by the coordinated binding of promoter sequences by transactivators. In humans and model organisms, promoter structures have been intensively examined in a genome-wide manner [20–22] and have been shown to play pivotal roles in gene and phenotype expression. However, the promoter structure of Babesia spp. remains unknown. Therefore, we characterized the promoter structure of B. bovis using high resolution TSS information derived from a full-length cDNA data set.
Genome-wide TSS distributions were examined by mapping 5′ ends of 5′ ESTs. In briefly, 9,412 reliable 5′ end sequences of 36 nts were selected from 9,573 5′ ESTs. Of these, 7,111 were successfully mapped onto the B. bovis genome sequence, 4,086 locations were assigned as TSSs after considering redundantly mapped sequences, and 2,023 TARs were identified.
FWM for B. bovis initiator-like motifs from 801 TSSs
Position from TSSs
In subsequent analyses, we identified a cis-element that is involved in transcriptional control. To generate a putative promoter set, -1000 to +1000 regions from typical TSSs of the 2,023 TARs were selected and examined using CisFinder , and -100–0 regions were examined using MEME . These analyses showed frequent distribution of ATGGGG and ACACA within promoter regions.
To validate the ACACA motif, we examined the positional distribution of these candidates and found periodical distribution around TSSs (Figure 3C). The reciprocal sequence TGTGT was also periodically distributed, but its phase was shifted (Figure 3C). Based on these observations, we examined all 5-mer repeat motifs comprising two nucleotides and found periodical distribution of TATAT as an additional motif (Figure 3C), with similar cycles but differing phases (Figure 3C). The related motifs CACAC, GTGTG, and ATATA also showed similar distributions (data not shown). The only other combinations that showed distinguishing distributions were GGGGG, CCCCC, GCGCG, and CGCGC, with peaks around -50 nts (Figure 3D). GGGGG and CCCCC motifs are closely related to ATGGGG and CCCCAT motifs, respectively. However, GCGCG and CGCGC motifs may be functional and gene ontology enrichment analyses showed frequent but insignificant presence of these in upstream promoter regions of ribonucleoprotein complex biogenesis (GO:0022613) genes (p = 0.078). To confirm the specificity of these motifs for apicomplexan parasites, we examined periodical distributions in the TSS database DBTSS, which contains precise positions of TSSs in the genomes of various organisms . Promoter regions from -1000 to +1000 of human and mouse TSSs were obtained and the distribution of ACACA, TGTGT, and TATAT motifs were examined as in B. bovis. However, no periodical distributions were found in human (Figure 3E) or mouse (data not shown) databases, and no related periodical distributions of other combinations were observed as in B. bovis (Figure 3F). In contrast, the ACACA motif was reportedly observed in T. parva and P. falciparum[31, 34], although periodical distributions have not been reported. Rather than reflecting the differences in species, these discrepancies may have been caused by differences in the precision of TSS identification. Nonetheless, these observations imply that the motif is common among some apicomplexan parasites, and the present periodical patterns had interval lengths of 140–150 nts. Minimum units of nucleosome repeat lengths comprise 147-bp DNA sequences around core histone octamers and 20-bp DNA linkers and are much longer than our observations. However, previous studies demonstrate that the minimum observed nucleosome repeat length is much closer to our observation of approximately 155 bp in Schizosaccharomyces pombe and Aspergillus nidulans[35–37], and P. falciparum[38, 39]. On the other hand, these discrepancies may reflect the involvement of unconventional nucleosome structures. The conventional histone octomer comprises two H2A, H2B dimers and H3, H4 tetramers. In contrast, unconventional histones comprising variants such as H2B.Z, H2A.Z, and CENP-A have specific functions that are distinguishable from the conventional one. Crystal structure analysis of human centromeric nucleosomes containing CENP-A suggests that only 121-bp DNA fragments tightly bind to nucleosomes, unlike conventional H3 nucleosomes . In P. falciparum, it was demonstrated that the nucleosome with H2A.Z specifically localizes to intergenic regions [41, 42]. Moreover, no homologue to the linker histone H1 has been identified in apicomplexan parasites . FAIRE-seq and MAINE-seq analyses in P. falciparum demonstrated that nucleosome binding to TSSs is associated with gene expression  and there are preferred DNA motifs for nucleosome assembly [45, 46]. These collateral data warrant the assumption that the observed periodical patterns in this study are involved in chromatin structure and regulate gene expression via chromatin remodeling processes.
In further analyses, we applied this scanning method to known apicomplexan and mammalian core promoter motifs. A previous study showed the distribution of GATTCC in T. parva and T. annulata at regions that are -20 nts from TSS . Moreover, GCATGC was identified as a PF14_0633-binding target in P. falciparum, an AP2-Sp-binding target in Plasmodium berghei, and a Toxoplasma Ribosomal Protein (TRP)-2-binding target in T. gondii. GTGCAC is known as a subtelomeric variant gene promoter element (SPE)-2  and a binding target of PFF0200c_DLD and PfSIP2 [47, 50, 51]. TAGCTA is also reportedly a binding target of Pb.AP2-O . Therefore, although the GTGCAC motif was moderately concentrated in the -50-nt area, the other motifs did not show distinguishing distributions in comparison with the ATGGGG/CCCCAT motif (Figure 3G). In particular, GATTCC was not specifically distributed around TSSs, as observed in T. parva and T. annulata, indicating that the motif is specific to Theileria spp. and may be involved in specific biological phenotypes, such as infectivity in lymphocytes. According to mammalian motifs, we examined TATA boxes, upstream TFIIB-recognition elements (BREu), downstream BRE (BREd), and downstream promoter elements (DPE), containing the consensus sequences TATAWAAR, SSRCGCC, RTDKKKK, and RGWYVT, respectively . In these analyses, TATA boxes showed periodical-like patterns (Figure 3G). In contrast, TATA boxes are known to be distributed around -30 nts from TSSs (Figure 3E), and TATA box consensus sequences are closely related to TATAT and ATATA motifs. These are also periodically distributed, suggesting that the observed pattern for TATA boxes was residual and no functional motifs correspond with the TATA box in B. bovis. This observation also indicates that other mammalian motifs are nonfunctional (Figure 3G).
Previous investigations of Plasmodium and Toxoplasma demonstrate promoter structures [43, 53–55], putative DNA cis-elements [34, 44, 50, 54, 56–58], and the involvement of chromatin structures in transcription [44, 53, 55, 59]. The present analyses of Babesia parasites were almost consistent with these studies and warrant the expansion of the concepts related to Babesia species. Nonetheless, the use of fine TSS mapping is a critical distinction between the present and previous studies and allowed more specific and sensitive assessment of the distribution of examined motifs, particularly for ACACA, TGTGT, and TATAT motifs that lack definition in previous studies [34, 50]. Therefore, the present analyses indicate that the distance from TSSs may be a critical factor for functionality of DNA cis-elements in apicomplexsan parasites.
The full-length cDNAs dataset enable us to revise previous gene model derived from the genome. In parallel, location-specific consensus motifs in promoter sequences were discovered by virtue of TSSs identification with one-base resolution of the method. These observations 1) indicate the utility of integrated bioinformatics and experimental data for improving genome annotations and 2) allowed the illustration of a model promoter composition, which supports the differences in transcriptional regulation frameworks between apicomplexan parasites and mammals.
Preparation of parasite RNA, and synthesis and sequencing of cDNA
The Texas strain of B. bovis was maintained in bovine erythrocytes cultured in GIT medium (WAKO, Osaka, Japan) using a microaerophilic stationary-phase culture system . Total RNA was extracted from B. bovis-infected erythrocytes using TRIzol (Invitrogen), and cDNA was synthesized using a previously described oligo-capping method . In briefly, 200 μg of purified total RNA was dephosphorylated using bacterial alkaline phosphatase and was ligated using the oligo-RNA 5′-AGCAUCGAGUCGGCCUUGUUGGCCUACUGG-3′ and T4 RNA ligase. Subsequently, cDNA was synthesized using the oligo-dT fusion primer 5′-GCGGCTGAAGACGGCCTATGTGGCCTTTTTTTTTTTTTTTTT-3′ with SuperScript® II (Invitrogen). The cDNA library was amplified using PCR with the primers 5′-AGCATCGAGTCGGCCTTGTTG-3′ and 5′-GCGGCTGAAGACGGCCTATGT-3′. Amplified fragments were then digested using SfiI and were ligated into a DraIII-digested pME18SFL3 plasmid vector in an orientation-defined manner. ESTs of 5′ and 3′ ends were obtained using the Sanger method with ABI 3730 sequencers following standard protocols for sequencing analysis.
Assembly, clustering, and annotation of ESTs
To obtain full-length cDNA sequences, 5′ and 3′ ESTs were assembled using a CAP3  with default parameters. The overlapping nucleic acid length cutoff was 40 and the overlapping identity cutoff was 90% for 5′ and 3′ ESTs, respectively. Putative CDSs of full-length cDNA were examined and intact CDSs of >50 amino acids were selected. Amino acid homology with preproposed genes (BbovisT2BoAnnotatedProteins_PiroplasmaDB-1.1.fasta) was examined using BLAST and homology was considered significant when E-values were 10-10. Full-length cDNA sequences were also mapped onto the genome of B. bovis (BbovisT2BoGenomic_PiroplasmaDB-1.1.fasta) using BLAT  with default parameters. The categories of full-length cDNA sequences (tier 1; identical, tier 2; amino acid variant, tier 3; structural variant, and tier 4; novel) were assigned according to BLAST and BLAT results using an in-house script. Copy DNAs that were mapped to positions of identical nucleic acid sequences of preproposed genes were assigned as “identical”, and those mapped to identical positions but with amino acid substitutions were assigned as “amino acid variants”. Copy DNAs that were mapped to similar but nonidentical positions to homologous preproposed genes were assigned as “structural variants”, and those with poor homology were assigned as “novel”. Full-length cDNAs were clustered using CAP3 with default parameters, and the subset of cDNAs in the same cluster were integrated into the highest tier. Among full-length cDNAs with differing sequences and the same tier status, cDNAs with fewer mutations and longer amino acids or nucleic acids were selected, and novel cDNAs were annotated using Blast2Go . To estimate gene expression, 5′ ESTs were examined. Initially, these were filtered by BLAST using the B. bovis genome database (BbovisT2BoGenomic). Subsequently, the filtered sequences were mapped onto both CDS (BbovisT2BoAnnotatedCDS) and novel sequences using BLAST, and the frequencies of mapped ESTs were determined.
Identification of promoter regions and prevalent motifs
To identify TSSs for each 5′ EST, 36 nts were clipped from 5′ ends and removed if they contained ambiguous nucleotides such as N. Selected sequence sets were then mapped onto the genome sequence of the B. bovis T2Bo strain  using Bowtie  with the acceptance of two mismatches. To assign TARs, mapped positions corresponding with TSSs were clustered if two TSSs were positioned within 20 nts. Gross mapped counts for each position and TAR were tallied, and the most frequently mapped TSS in each TAR was assigned as a representative TSS, as defined in previous reports [33, 66]. To identify motifs on TSSs, -10 to +10 regions from representative TSSs were selected and examined using MEME . A frequency weight matrix (FWM) was calculated on the basis of sequence motifs with p values of <0.05, and a sequence logo was generated on the basis of FWM using WebLogo . As a putative promoter region, sequences comprising -1000 to +1000 regions from representative TSSs of each TAR were selected. Human and mouse promoter regions were obtained from DBTSS . Candidate DNA motifs were then estimated using CisFinder  with default parameters and -1000 to +1000 regions from representative TSSs. These were also estimated using MEME with n sites of 2023, maximum DNA sizes of 250000, and maxw of 8 as parameters in -100–0 regions of representative TSSs. The distributions of identified motifs around peak TSSs were examined by scanning the motif over the promoter using an in-house script. The positions of each motif were scanned in exact match condition and summed for every 10 bases. Observed frequencies were divided by the theoretical frequency based on nucleotide biases that were estimated from the nucleotide composition of the genome. The distributions of 5-mer motif candidates were smoothed by averaging the surrounding 30 bases. For functional enrichment analyses, we selected TARs with motifs from the initial TAR set of 2,023-nt sequences. Genes and TARs were linked if TARs were present in the -500 to +200 region from the 5′ end of the CDS, as defined in the BbovisT2BoAnnotatedCDS. Finally, gene ontology terms for B. bovis genes were annotated using Blast2Go .
Availability of supporting data
Supporting sequence data are available in the DDBJ (http://www.ddbj.nig.ac.jp/index-e.html) under accession numbers HX874250–HX894778 and AK440354–AK442468.
This paper is dedicated to the memory of Dr. Junichi Watanabe.
This work was supported in part by the Japan Society for the Promotion of Science (JSPS) Core-to-Core Program “Research Exchange in Genome Cohort Studies for Field Malaria Parasites and Vector Insects,” a Grant-in-aid for Scientific Research on Innovative Areas “Genome Science” (221S0002) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), and a Cooperative Research Grant (26-joint-12) of National Research Center for Protozoan Diseases, Obihiro University of Agriculture and Veterinary Medicine.
- Bock R, Jackson L, De Vos A, Jorgensen W: Babesiosis of cattle. Parasitology. 2004, 129 (Suppl): S247-S269.PubMedView ArticleGoogle Scholar
- Vial HJ, Gorenflot A: Chemotherapy against babesiosis. Vet Parasitol. 2006, 138 (1–2): 147-160.PubMedView ArticleGoogle Scholar
- Brayton KA, Lau AO, Herndon DR, Hannick L, Kappmeyer LS, Berens SJ, Bidwell SL, Brown WC, Crabtree J, Fadrosh D, Feldblum T, Forberger HA, Haas BJ, Howell JM, Khouri H, Koo H, Mann DJ, Norimine J, Paulsen IT, Radune D, Ren Q, Smith RK, Suarez CE, White O, Wortman JR, Knowles DP, McElwain TF, Nene VM: Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog. 2007, 3 (10): 1401-1413.PubMedView ArticleGoogle Scholar
- Wakaguri H, Suzuki Y, Sasaki M, Sugano S, Watanabe J: Inconsistencies of genome annotations in apicomplexan parasites revealed by 5′-end-one-pass and full-length sequences of oligo-capped cDNAs. BMC Genomics. 2009, 10: 312-PubMed CentralPubMedView ArticleGoogle Scholar
- Yamagishi J, Wakaguri H, Sugano S, Kawano S, Fujisaki K, Sugimoto C, Watanabe J, Suzuki Y, Kimata I, Xuan X: Construction and analysis of full-length cDNA library of Cryptosporidium parvum. Parasitol Int. 2011, 60 (2): 199-202.PubMedView ArticleGoogle Scholar
- Tsuchihara K, Suzuki Y, Wakaguri H, Irie T, Tanimoto K, Hashimoto S, Matsushima K, Mizushima-Sugano J, Yamashita R, Nakai K, Bentley D, Esumi H, Sugano S: Massive transcriptional start site analysis of human genes in hypoxia cells. Nucleic Acids Res. 2009, 37 (7): 2249-2263.PubMed CentralPubMedView ArticleGoogle Scholar
- Yamagishi J, Wakaguri H, Ueno A, Goo YK, Tolba M, Igarashi M, Nishikawa Y, Sugimoto C, Sugano S, Suzuki Y, Watanabe J, Xuan X: High-resolution characterization of Toxoplasma gondii transcriptome with a massive parallel sequencing method. DNA Res. 2010, 17 (4): 233-243.PubMed CentralPubMedView ArticleGoogle Scholar
- Suzuki Y, Sugano S: Construction of full-length-enriched cDNA libraries. The oligo-capping method. Methods Mol Biol. 2001, 175: 143-153.PubMedGoogle Scholar
- Majoros WH, Pertea M, Salzberg SL: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004, 20 (16): 2878-2879.PubMedView ArticleGoogle Scholar
- Komar AA, Mazumder B, Merrick WC: A new framework for understanding IRES-mediated translation. Gene. 2012, 502 (2): 75-86.PubMed CentralPubMedView ArticleGoogle Scholar
- Serganov A, Nudler E: A decade of riboswitches. Cell. 2013, 152 (1–2): 17-24.PubMed CentralPubMedView ArticleGoogle Scholar
- Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol. 2002, 3 (3): REVIEWS0004-PubMed CentralPubMedView ArticleGoogle Scholar
- Creancier L, Morello D, Mercier P, Prats AC: Fibroblast growth factor 2 internal ribosome entry site (IRES) activity ex vivo and in transgenic mice reveals a stringent tissue-specific regulation. J Cell Biol. 2000, 150 (1): 275-281.PubMed CentralPubMedView ArticleGoogle Scholar
- Lawless C, Pearson RD, Selley JN, Smirnova JB, Grant CM, Ashe MP, Pavitt GD, Hubbard SJ: Upstream sequence elements direct post-transcriptional regulation of gene expression under stress conditions in yeast. BMC Genomics. 2009, 10: 7-PubMed CentralPubMedView ArticleGoogle Scholar
- Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S: Structural and functional features of eukaryotic mRNA untranslated regions. Gene. 2001, 276 (1–2): 73-81.PubMedView ArticleGoogle Scholar
- Yamagishi J, Watanabe J, Goo YK, Masatani T, Suzuki Y, Xuan X: Characterization of Toxoplasma gondii 5′ UTR with encyclopedic TSS information. J Parasitol. 2012, 98 (2): 445-447.PubMedView ArticleGoogle Scholar
- De Vries E, Corton C, Harris B, Cornelissen AW, Berriman M: Expressed sequence tag (EST) analysis of the erythrocytic stages of Babesia bovis. Vet Parasitol. 2006, 138 (1–2): 61-74.PubMedView ArticleGoogle Scholar
- Ogasawara O, Kawamoto S, Okubo K: Zipf’s law and human transcriptomes: an explanation with an evolutionary model. C R Biol. 2003, 326 (10–11): 1097-1101.PubMedView ArticleGoogle Scholar
- Ueda HR, Hayashi S, Matsuyama S, Yomo T, Hashimoto S, Kay SA, Hogenesch JB, Iino M: Universality and flexibility in gene expression from bacteria to human. Proc Natl Acad Sci U S A. 2004, 101 (11): 3765-3769.PubMed CentralPubMedView ArticleGoogle Scholar
- Butler JE, Kadonaga JT: The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 2002, 16 (20): 2583-2592.PubMedView ArticleGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34 (Database issue): D108-D110.PubMed CentralPubMedView ArticleGoogle Scholar
- Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, Lajoie BR, Landt SG, Lee BK, Pauli F, Rosenbloom KR, Sabo P, Safi A, Sanyal A, Shoresh N, Simon JM, Song L, Trinklein ND, Altshuler RC, Birney E, Brown JB, Cheng C, Djebali S, Dong X, ENCODE Project Consortium, et al: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489 (7414): 57-74.View ArticleGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings/International Conference on Intelligent Systems for Molecular Biology; ISMB International Conference on Intelligent Systems for. Mol Biol. 1994, 2: 28-36.Google Scholar
- Bucher P: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol. 1990, 212 (4): 563-578.PubMedView ArticleGoogle Scholar
- Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM, Solovyev VV: PlantProm: a database of plant promoter sequences. Nucleic Acids Res. 2003, 31 (1): 114-117.PubMed CentralPubMedView ArticleGoogle Scholar
- Verrijzer CP, Yokomori K, Chen JL, Tjian R: Drosophila TAFII150: similarity to yeast gene TSM-1 and specific binding to core promoter DNA. Science. 1994, 264 (5161): 933-941.PubMedView ArticleGoogle Scholar
- Chalkley GE, Verrijzer CP: DNA binding site selection by RNA polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO J. 1999, 18 (17): 4835-4845.PubMed CentralPubMedView ArticleGoogle Scholar
- Tora L: A unified nomenclature for TATA box binding protein (TBP)-associated factors (TAFs) involved in RNA polymerase II transcription. Genes Dev. 2002, 16 (6): 673-675.PubMedView ArticleGoogle Scholar
- Callebaut I, Prat K, Meurice E, Mornon JP, Tomavo S: Prediction of the general transcription factors associated with RNA polymerase II in Plasmodium falciparum: conserved features and differences relative to other eukaryotes. BMC Genomics. 2005, 6: 100-PubMed CentralPubMedView ArticleGoogle Scholar
- Sharov AA, Ko MS: Exhaustive search for over-represented DNA sequence motifs with CisFinder. DNA Res. 2009, 16 (5): 261-273.PubMed CentralPubMedView ArticleGoogle Scholar
- Guo X, Silva JC: Properties of non-coding DNA and identification of putative cis-regulatory elements in Theileria parva. BMC Genomics. 2008, 9: 582-PubMed CentralPubMedView ArticleGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20 (9): 1464-1465.PubMedView ArticleGoogle Scholar
- Yamashita R, Sugano S, Suzuki Y, Nakai K: DBTSS: DataBase of Transcriptional Start Sites progress report in 2012. Nucleic Acids Res. 2012, 40 (Database issue): D150-D154.PubMed CentralPubMedView ArticleGoogle Scholar
- Young JA, Johnson JR, Benner C, Yan SF, Chen K, Le Roch KG, Zhou Y, Winzeler EA: In silico discovery of transcription regulatory elements in Plasmodium falciparum. BMC Genomics. 2008, 9: 70-PubMed CentralPubMedView ArticleGoogle Scholar
- Godde JS, Widom J: Chromatin structure of Schizosaccharomyces pombe. A nucleosome repeat length that is shorter than the chromatosomal DNA length. J Mol Biol. 1992, 226 (4): 1009-1025.PubMedView ArticleGoogle Scholar
- Morris NR: Nucleosome structure in Aspergillus nidulans. Cell. 1976, 8 (3): 357-363.PubMedView ArticleGoogle Scholar
- Lantermann AB, Straub T, Stralfors A, Yuan GC, Ekwall K, Korber P: Schizosaccharomyces pombe genome-wide nucleosome mapping reveals positioning mechanisms distinct from those of Saccharomyces cerevisiae. Nat Struct Mol Biol. 2010, 17 (2): 251-257.PubMedView ArticleGoogle Scholar
- Lanzer M, De Bruin D, Wertheimer SP, Ravetch JV: Transcriptional and nucleosomal characterization of a subtelomeric gene cluster flanking a site of chromosomal rearrangements in Plasmodium falciparum. Nucleic Acids Res. 1994, 22 (20): 4176-4182.PubMed CentralPubMedView ArticleGoogle Scholar
- Lanzer M, Wertheimer SP, De Bruin D, Ravetch JV: Chromatin structure determines the sites of chromosome breakages in Plasmodium falciparum. Nucleic Acids Res. 1994, 22 (15): 3099-3103.PubMed CentralPubMedView ArticleGoogle Scholar
- Tachiwana H, Kagawa W, Shiga T, Osakabe A, Miya Y, Saito K, Hayashi-Takanaka Y, Oda T, Sato M, Park SY, Kimura H, Kurumizaka H: Crystal structure of the human centromeric nucleosome containing CENP-A. Nature. 2011, 476 (7359): 232-235.PubMedView ArticleGoogle Scholar
- Bartfai R, Hoeijmakers WA, Salcedo-Amaya AM, Smits AH, Janssen-Megens E, Kaan A, Treeck M, Gilberger TW, Francoijs KJ, Stunnenberg HG: H2A.Z demarcates intergenic regions of the plasmodium falciparum epigenome that are dynamically marked by H3K9ac and H3K4me3. PLoS Pathog. 2010, 6 (12): e1001223-PubMed CentralPubMedView ArticleGoogle Scholar
- Hoeijmakers WA, Salcedo-Amaya AM, Smits AH, Francoijs KJ, Treeck M, Gilberger TW, Stunnenberg HG, Bartfai R: H2A.Z/H2B.Z double-variant nucleosomes inhabit the AT-rich promoter regions of the Plasmodium falciparum genome. Mol Microbiol. 2013, 87 (5): 1061-1073.PubMed CentralPubMedView ArticleGoogle Scholar
- Meissner M, Soldati D: The transcription machinery and the molecular toolbox to control gene expression in Toxoplasma gondii and other protozoan parasites. Microbes Infect. 2005, 7 (13): 1376-1384.PubMedView ArticleGoogle Scholar
- Ponts N, Harris EY, Lonardi S, Le Roch KG: Nucleosome occupancy at transcription start sites in the human malaria parasite: a hard-wired evolution of virulence?. Infect Genet Evol. 2011, 11 (4): 716-724.PubMed CentralPubMedView ArticleGoogle Scholar
- Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JP, Widom J: A genomic code for nucleosome positioning. Nature. 2006, 442 (7104): 772-778.PubMed CentralPubMedView ArticleGoogle Scholar
- Ichikawa Y, Morohashi N, Nishimura Y, Kurumizaka H, Shimizu M: Telomeric repeats act as nucleosome-disfavouring sequences in vivo. Nucleic Acids Res. 2014, 42 (3): 1541-1552.PubMed CentralPubMedView ArticleGoogle Scholar
- De Silva EK, Gehrke AR, Olszewski K, Leon I, Chahal JS, Bulyk ML, Llinas M: Specific DNA-binding by apicomplexan AP2 transcription factors. Proc Natl Acad Sci U S A. 2008, 105 (24): 8393-8398.PubMed CentralPubMedView ArticleGoogle Scholar
- Yuda M, Iwanaga S, Shigenobu S, Kato T, Kaneko I: Transcription factor AP2-Sp and its target genes in malarial sporozoites. Mol Microbiol. 2010, 75 (4): 854-863.PubMedView ArticleGoogle Scholar
- Van Poppel NF, Welagen J, Vermeulen AN, Schaap D: The complete set of Toxoplasma gondii ribosomal protein genes contains two conserved promoter elements. Parasitology. 2006, 133 (Pt 1): 19-31.PubMedView ArticleGoogle Scholar
- Campbell TL, De Silva EK, Olszewski KL, Elemento O, Llinas M: Identification and genome-wide prediction of DNA binding specificities for the ApiAP2 family of regulators from the malaria parasite. PLoS Pathog. 2010, 6 (10): e1001165-PubMed CentralPubMedView ArticleGoogle Scholar
- Flueck C, Bartfai R, Niederwieser I, Witmer K, Alako BT, Moes S, Bozdech Z, Jenoe P, Stunnenberg HG, Voss TS: A major role for the Plasmodium falciparum ApiAP2 protein PfSIP2 in chromosome end biology. PLoS Pathog. 2010, 6 (2): e1000784-PubMed CentralPubMedView ArticleGoogle Scholar
- Yuda M, Iwanaga S, Shigenobu S, Mair GR, Janse CJ, Waters AP, Kato T, Kaneko I: Identification of a transcription factor in the mosquito-invasive stage of malaria parasites. Mol Microbiol. 2009, 71 (6): 1402-1414.PubMedView ArticleGoogle Scholar
- Horrocks P, Wong E, Russell K, Emes RD: Control of gene expression in Plasmodium falciparum - ten years on. Mol Biochem Parasitol. 2009, 164 (1): 9-25.PubMedView ArticleGoogle Scholar
- Militello KT, Dodge M, Bethke L, Wirth DF: Identification of regulatory elements in the Plasmodium falciparum genome. Mol Biochem Parasitol. 2004, 134 (1): 75-88.PubMedView ArticleGoogle Scholar
- Horrocks P, Dechering K, Lanzer M: Control of gene expression in Plasmodium falciparum. Mol Biochem Parasitol. 1998, 95 (2): 171-181.PubMedView ArticleGoogle Scholar
- Wu J, Sieglaff DH, Gervin J, Xie XS: Discovering regulatory motifs in the Plasmodium genome using comparative genomics. Bioinformatics. 2008, 24 (17): 1843-1849.PubMed CentralPubMedView ArticleGoogle Scholar
- Mullapudi N, Joseph SJ, Kissinger JC: Identification and functional characterization of cis-regulatory elements in the apicomplexan parasite Toxoplasma gondii. Genome Biol. 2009, 10 (4): R34-PubMed CentralPubMedView ArticleGoogle Scholar
- Harris EY, Ponts N, Le Roch KG, Lonardi S: Chromatin-driven de novo discovery of DNA binding motifs in the human malaria parasite. BMC Genomics. 2011, 12: 601-PubMed CentralPubMedView ArticleGoogle Scholar
- Westenberger SJ, Cui L, Dharia N, Winzeler E, Cui L: Genome-wide nucleosome mapping of Plasmodium falciparum reveals histone-rich coding and histone-poor intergenic regions and chromatin remodeling of core and subtelomeric genes. BMC Genomics. 2009, 10: 610-PubMed CentralPubMedView ArticleGoogle Scholar
- Bork S, Okamura M, Matsuo T, Kumar S, Yokoyama N, Igarashi I: Host serum modifies the drug susceptibility of Babesia bovis in vitro. Parasitology. 2005, 130 (Pt 5): 489-492.PubMedView ArticleGoogle Scholar
- Suzuki Y, Sugano S: Construction of a full-length enriched and a 5′-end enriched cDNA library using the oligo-capping method. Methods Mol Biol. 2003, 221: 73-91.PubMedGoogle Scholar
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877.PubMed CentralPubMedView ArticleGoogle Scholar
- Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.PubMed CentralPubMedView ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676.PubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-PubMed CentralPubMedView ArticleGoogle Scholar
- Yamashita R, Suzuki Y, Sugano S, Nakai K: Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. Gene. 2005, 350 (2): 129-136.PubMedView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.