Trypanosoma cruzi mitochondrial maxicircles display species- and strain-specific variation and a conserved element in the non-coding region
© Westenberger et al; licensee BioMed Central Ltd. 2006
Received: 12 January 2006
Accepted: 22 March 2006
Published: 22 March 2006
The mitochondrial DNA of kinetoplastid flagellates is distinctive in the eukaryotic world due to its massive size, complex form and large sequence content. Comprised of catenated maxicircles that contain rRNA and protein-coding genes and thousands of heterogeneous minicircles encoding small guide RNAs, the kinetoplast network has evolved along with an extreme form of mRNA processing in the form of uridine insertion and deletion RNA editing. Many maxicircle-encoded mRNAs cannot be translated without this post-transcriptional sequence modification.
We present the complete sequence and annotation of the Trypanosoma cruzi maxicircles for the CL Brener and Esmeraldo strains. Gene order is syntenic with Trypanosoma brucei and Leishmania tarentolae maxicircles. The non-coding components have strain-specific repetitive regions and a variable region that is unique for each strain with the exception of a conserved sequence element that may serve as an origin of replication, but shows no sequence identity with L. tarentolae or T. brucei. Alternative assemblies of the variable region demonstrate intra-strain heterogeneity of the maxicircle population. The extent of mRNA editing required for particular genes approximates that seen in T. brucei. Extensively edited genes were more divergent among the genera than non-edited and rRNA genes. Esmeraldo contains a unique 236-bp deletion that removes the 5'-ends of ND4 and CR4 and the intergenic region. Esmeraldo shows additional insertions and deletions outside of areas edited in other species in ND5, MURF1, and MURF2, while CL Brener has a distinct insertion in MURF2.
The CL Brener and Esmeraldo maxicircles represent two of three previously defined maxicircle clades and promise utility as taxonomic markers. Restoration of the disrupted reading frames might be accomplished by strain-specific RNA editing. Elements in the non-coding region may be important for replication, transcription, and anchoring of the maxicircle within the kinetoplast network.
The mitochondrial DNA referred to as the kinetoplast (kDNA) is a spectacular structure that comprises approximately 20–25% of the total cellular DNA in Trypanosoma cruzi, a member of the flagellated protozoans of the Order Kinetoplastida . The equally dramatic process of RNA editing is also found in this specialized subcellular compartment, and the two are intimately associated. The kDNA is comprised of two classes of circular DNA molecules that are catenated and compressed into a disc-like structure. Maxicircles are the functional equivalent of the mitochondrial DNA of other eukaryotes, containing genes for mitochondrial rRNAs and hydrophobic mitochondrial proteins mostly involved in the membrane-bound oxidative phosphorylation pathway . At first glance the maxicircle genomes appear to lack several genes that are hallmarks of other mitochondrial genomes, while other genes are missing elements key to their translation such as initiation codons or contiguous ORFs. Post-transcriptional uridine insertion/deletion RNA editing resolves most of these problematic issues, by creating start codons [3, 4], correcting internal frameshifts (e.g., four uridines inserted in COII [5–7]), and extensively modifying otherwise unrecognizable mRNA transcripts to create entire ORFs  (e.g., 547 uridines inserted and 41 deleted in COIII of Trypanosoma brucei). The mechanistic process of RNA editing is the subject of intense study in T. brucei and Leishmania tarentolae[10, 11].
The heterogeneous minicircle population makes up the bulk of the kDNA mass with tens of thousands of copies per network, and carries the specific information for RNA editing in the form of guide RNAs (gRNAs). A handful of gRNAs are also encoded on the maxicircle, with 15 thus far identified in L. tarentolae and three in T. brucei. The gRNAs interact with the mRNA templates through hybridization to dictate the precise location and number of uridine insertions or deletions; the interaction is tolerant of wobble base pairing of G-U, in addition to the standard Watson-Crick pairs. The level of sequence variability permissible in the gRNA pool with no loss of functional information is impressive, as any pairing with G or U residues in the mRNA are unaffected by transition mutations in the gRNA. This leads to tolerance of sequence divergence while maintaining function, as evidenced by the lack of cross-recognition of the gRNAs from two strains (SylvioX10 and CANIII) of T. cruzi. The minicircle variable regions that encode the gRNAs have been used as molecular markers for strain genotyping and comparison . The minicircle population of T. cruzi is highly variable between strains and evolves rapidly over time, allowing differentiation of closely related strains [15–18].
T. cruzi is the causative agent of human Chagas disease, affecting millions throughout the American continents, and is estimated to kill 100,000 people per year . The infection is spread by introduction of vector insect contaminated feces into an open wound or mucus membrane of the victim. Once inside the human host, T. cruzi infects macrophages and replicates aggressively, producing high titers of parasite in the bloodstream that then spread throughout the tissues of the body. Approximately 5% of infected individuals die in this acute stage. Ultimately the infection will enter a quiescent phase that can last for decades, characterized by a low blood titer of the parasite and no overt symptoms. In 30% of the cases, clinical pathologies will eventually develop after 20+ years, reflecting a preference for cardiac and smooth muscle tissue by the parasite. Chronic phase chagasic patients may die from one of a suite of mega syndromes affecting the heart, esophagus, or colon. While the factors determining development of these specific syndromes is not understood, the genetics of both the host and the parasite play roles in the outcome .
Unraveling the genetics of T. cruzi has been a major endeavor for decades. Currently T. cruzi is partitioned into two groups  that can be subdivided into six subgroups or discrete typing units (DTUs) designated I, IIa, IIb, IIc, IId, and IIe [22–24]. Nuclear markers indicate that homozygous DTUs IIa and IIc are the result of a relatively ancient hybridization event between strains of DTUs I and IIb; the extensively heterozygous DTUs IId and IIe are products of a more recent hybridization event and possess two alleles similar to those found in DTUs IIb and IIc at most loci . The reference strain chosen for the T. cruzi genome-sequencing project, CL Brener, is a member of DTU IIe. Sequencing of Esmeraldo, a homozygous DTU IIb strain, was undertaken to aid in resolution of ambiguity in the assembly of the CL Brener genome.
The influence of a persistent T. cruzi infection in development of chronic stage of Chagas disease is the subject of debate, since the direct cause of pathogenesis has yet to be determined conclusively [26–31]. Essentially, there are two alternative hypotheses: 1) the presence of the parasite itself results in destruction of infected tissues, versus 2) infection by the parasite triggers an autoimmune response against cells not necessarily harboring active infections. In addition to a multitude of protein antigens , an alternative stimulus for initiation of the autoimmune response has been proposed. During the cellular invasion process, DNA from the parasite is integrated into the host genome. The DNA species implicated in this horizontal transfer event is the minicircle [33, 34]. Whether this is a benign byproduct of infection, or a potential link to Chagas disease pathology, the function of T. cruzi kDNA is relevant to understanding parasite biology and the host-parasite relationship.
Genetic variation in the T. cruzi maxicircle will be subject to more stringent selection pressure due to the presence of structural genes compared to the transition-tolerant gRNAs largely carried in the minicircles, and should provide phylogenetically meaningful markers. Maxicircle genes from T. brucei and L. tarentolae have been determined, showing that gene order, or synteny, is conserved . Estimates of the total size of the T. cruzi maxicircle fall between 21 and 39 kb by various methods [35, 36], and a fragment of maxicircle from the Tulahuen strain was sequenced . Assorted fragments of maxicircles from several T. cruzi strains have been examined for taxonomic studies [38–41]. An extensive T. cruzi maxicircle survey examined a 1.25-kb fragment, identifying a correlation between the DTUs and three maxicircle clades : clade A corresponded to DTU I, clade B to strains of DTUs IIa, IIc, IId, and IIe, and clade C was exclusive to DTU IIb strains. The same association was observed in similar analyses .
We present annotated sequences of the T. cruzi maxicircles for the CL Brener (DTU IIe) and Esmeraldo (DTU IIb) strains assembled from data generated by the TIGR-SBRI-KI T. cruzi Sequencing Consortium (TSK-TSC). The anticipated cohort of genes was present, and the non-coding regions of both strains were assembled. The coding region of each genome contained little or no single nucleotide polymorphism (SNP) variability, but do possess strain-specific insertion/deletion mutations (indels). Esmeraldo has a large deletion removing substantial portions of two adjacent genes. This study provides the framework for continued study of kDNA biology, RNA editing, and Chagas disease pathology in T. cruzi.
Two T. cruzi maxicircle assemblies
The T. cruzi genome project took the approach of whole-genome shotgun sequencing of clones from libraries of size-selected fragments from a total cell DNA preparation with a minimum insert size of 5 kb. Clones were sequenced from both ends, providing between 500 bp and 1 kb of primary sequence and a physical link between opposing end-sequenced mate pairs from each clone . A BLAST search of the primary reads from this project identified sequences of the CL Brener and Esmeraldo strains with identity to maxicircle coding regions from T. cruzi and T. brucei. The chromosome shotgun or large insert clone-based sequencing strategies employed by the L. major and T. brucei genome projects would preclude the capture of maxicircles.
Gene positions and lengths on CL Brener and Esmeraldo maxicircle consensus sequences
Editing of mRNA
5' end edited *
5' end edited
5' half edited
5' end edited
5' end edited
Internal editing +4Us
5' end edited
The coverage of these genomes provides a high level of confidence in the assembled products. The CL Brener nuclear genome was sequenced to approximately 7X coverage, and Esmeraldo to 2.2X coverage. Despite this difference, the maxicircle genome coverage is deep for both strains. The high relative abundance of maxicircle sequences in the Esmeraldo dataset suggests that the Esmeraldo total cell DNA preparation had a higher content of kDNA relative to the CL Brener starting material. The difference was probably due to loss of intact network during phenol extraction, and not representative of the relative amount of kDNA in each strain.
Nucleotide skew identifies gene orientation and editing locations
The T. cruzi maxicircle coding regions provide the primary templates for the RNA editing process. Using the maxicircle annotation for L. tarentolae and T. brucei the identification of ORFs and 'cryptic' genes that are extensively edited at the RNA level was straightforward. In comparative analyses CL Brener was used as the T. cruzi reference sequence.
Average percent identities among CL Brener, Esmeraldo,T. brucei and L. tarentolae rRNAs, gene coding regions and inferred protein sequences
CL Brener vs. Esmeraldo
CL Brener vs. T. brucei
Esmeraldo vs. T. brucei
CL Brener vs. L. tarentolae
Esmeraldo vs. L. tarentoae
T. brucei vs. L. tarentolae
Similarity of RNA editing in T. cruzi and T. brucei
The nucleotide skew analyses indicated that T. cruzi maxicircle coding regions have GC skew patterns suggestive of extensive editing of genes that are differentially edited in T. brucei and L. tarentolae. Direct comparisons of the maxicircle genomes were performed to confirm this observation.
Nucleotide composition of T. cruzi maxicircle regions
A conserved element in the variable region of the non-coding sequences
Assembly of the CL Brener and Esmeraldo maxicircles allowed us to examine intra-specific variation at the genomic level. As the mitochondrial DNA of vertebrates is an excellent marker for intra-specific differences, the T. cruzi maxicircle is an interesting candidate for distinction between DTUs, and has been used previously to distinguish three clades [38, 40]. Sequence differences were found in the gene-coding and non-coding regions. We begin with consideration of the non-coding regions where the sequences are most divergent.
Conservation of the 325-bp element between CL Brener and Esmeraldo within the otherwise heterogeneous non-coding region implies a conserved function. A search for similar elements within the variable regions of T. brucei and L. tarentolae maxicircles, their entire genomes, and the NCBI databases, yielded no match. However, the element is in the same region upstream of the 12S rRNA and shows a tandem organization similar to a duplicated sequence described in the Variable Region III of the T. brucei maxicircle . Of the two copies in T. brucei, the 5' copy is longer, while the 3' copy is shorter, analogous to the pattern in CL Brener. No sequence identity can be seen between the T. cruzi element and the T. brucei sequence, but the common organization within the non-coding regions is suggestive. The 39-bp palindrome sequence found in T. cruzi is an attractive candidate for a dimeric or multimeric protein complex binding site. A search for palindromes within the analogous tandem repeats of T. brucei revealed a 14-bp palindrome (TAAATTTAAATTTA) present in both repeats. The T. brucei palindrome is shorter and of different sequence composition relative to the T. cruzi palindrome.
The repetitive portion of the non-coding region was visualized as a dense block of identity in dot plot analyses comparing this region to itself (Fig. 4A). In contrast to the variable region, the repetitive region showed no conservation between CL Brener and Esmeraldo, with unique repeated motifs of different size and sequence composition for each strain. Small motifs were conserved within the repeated blocks of sequence for each strain [see Additional file 4]. The CL Brener repetitive region is composed of variants of 267, 330, or 390 bp. Esmeraldo showed greater intrastrain variability with repeated elements of 296, 307, or 396 bp.
Previous comparisons of repetitive region motifs found no conservation between T. brucei and L. tarentolae. T. cruzi also shows no conservation of motifs between strains or among trypanosomatid species. All are A-rich with many tandem poly (A)-tracts and a greater than average AT skew (Table 2). Nucleotide composition analysis revealed a higher %AT in the repetitive region than the coding region or the overall maxicircle sequence for all trypanosomatid maxicircle sequences (Table 2). The variability of these repeats may be useful as specific strain or DTU markers.
Variability in the T. cruzi coding regions
Protein-coding regions provide specific markers for the variation among T. cruzi strains [38, 40] and, in contrast to the non-coding regions, they have a recognizable selective pressure. In considering the functional utility of the maxicircle as a molecular marker for strain identification and classification, the coding regions present variation that is likely conserved among DTUs or subgroups of strains, while the variable regions may allow differentiation among closely related isolates. No significant variation was observed within the consensus assembly of the coding region of each strain, indicating high intra-strain homogeneity of the maxicircle population. The coding region showed high levels of identity between the two T. cruzi strains, with some 'editable' differences observed at SNPs and indels in poly (T) tracts within edited genes. Comparisons of the coding regions from CL Brener and Esmeraldo showed identity ranging from 86.2% for extensively edited genes, to 89.9% for non-edited genes, to 92.6% for rRNAs (Table 3). Several examples of small indels as well as a substantial deletion within the coding region of the maxicircle with potential detrimental effects on protein translation will be described.
Truncation of two Esmeraldo maxicircle genes
The Esmeraldo CR4 is missing the first 34 nt of its coding region as defined in CL Brener (Fig. 5B). By analogy with the extensively edited T. brucei CR4, 50 nt of the edited ORF would be lost. An in-frame AUG, 15 nt downstream of the deletion, may allow for translation of a truncated protein of 123 aa, compared to 147 aa predicted for CL Brener. TMpred predicts the first two transmembrane domains of CR4 to be at aa 4–28 and aa 32–55, thus the truncated CR4 protein would lack the first transmembrane domain. The function of CR4 is unknown, but appears conserved with the T. brucei and L. tarentolae sequences that display 77% and 74% identity at the edited mRNA and predicted protein levels, respectively.
A BLAST search using the corresponding region from CL Brener confirmed that the deleted sequence is not present in any of the Esmeraldo reads; greater than 50X regional coverage gives high confidence that the deletion is found in all Esmeraldo maxicircles. To query directly the absence of this region in Esmeraldo, we examined Esmeraldo DNA along with six additional DTU IIb strains and 13 strains representing the five other DTUs (data not shown). PCR using primers flanking the Esmeraldo deletion showed that only Esmeraldo had this deletion, and no vestige of a full-length version was detected; neither of the DTU I representatives yielded amplification products, likely due to SNP variation in the primer annealing site.
The unique appearance of this deletion in the Esmeraldo strain suggests that it may be a recent event tolerated in a cultured strain, but deleterious in wild populations. The indels detailed in the following section point to additional anomalies in the Esmeraldo maxicircle that may be incompatible with virulence of this strain.
ND5, MURF1, and MURF2 frameshifts
The Esmeraldo ND5, which is not edited in T. brucei and seemingly does not require editing in CL Brener, has three frameshift mutations relative to the CL Brener sequence [see Additional file 5] The first is an insertion at position 479 resulting in the introduction of two in-frame stop codons 14- and 34-nt downstream. The reading frame is restored by a single nt deletion relative to the CL Brener sequence at position 514. Another single nt deletion at position 1210 again results in a frameshift that alters the remainder of the ORF. Thus Esmeraldo ND5 translated directly from the DNA sequence would generate a truncated protein of 165 aa compared to the 589 aa product of CL Brener.
MURF1 is not well annotated in other species. 5' editing could create the initiation codon, but no analogous start codon is annotated in T. brucei or L. tarentolae. We have used the largest continuous ORF present in T. brucei to represent the start point for this gene. Using the equivalent start position in T. cruzi, the CL Brener MURF1 codes for a 446 aa protein. Esmeraldo MURF1 contains two single-nt deletions at positions 73 and 1225 relative to the CL Brener sequence, both of which result in internal frameshifts [see Additional file 5]. In addition, Esmeraldo shows a 9-bp insertion at positions 1315–1323. The deletion at position 73 may have no effect, as the 5' end of the ORF is undetermined. The frameshift at position 1225 introduces an in-frame stop codon that would result in a product truncated by 35 aa relative to CL Brener.
The T. brucei MURF2 mRNA is edited at the 5' end by the insertion of 24 uridine residues to create the initiation codon and the reading frame up to nt 45 of the ORF. This localized editing appears to be conserved in T. cruzi; a conserved amino terminus can be predicted after hypothetical RNA editing in both strains. However, CL Brener and Esmeraldo both show frameshift mutations in areas that are apparently not edited in T. brucei [see Additional file 5]. The CL Brener MURF2 had a 4-nt insertion relative to T. brucei after nt 266 of the predicted edited ORF, creating in a frameshift that introduces a premature stop codon. Thus a 101-aa protein would result, containing the first 88 of 357 aa in common with T. brucei and L. tarentolae. The Esmeraldo MURF2 contained four insertion or deletion events relative to the T. brucei sequence: a 2-nt insertion at position 322, a 9-nt deletion at 493, and two single-nt deletions at positions 746 and 770. Cumulatively, these differences result in a frameshift starting at aa 181 and continuing through aa 249, where the frame reverts to the T. brucei cadence, leaving 69 aa in the central portion of the 357-aa protein altered. Approximately half of amino acids in this shifted region are conservative changes relative to CL Brener.
We report the complete maxicircle consensus sequences from two strains of T. cruzi that are syntenic with the coding regions of maxicircles in T. brucei and L. tarentolae. All three mitochondrial genomes contain 18 protein-coding genes and two rRNAs. Strain-specific frame shifts were documented in several T. cruzi genes at positions not edited in other kinetoplastids. Comparison of the non-coding region of the two T. cruzi strains showed a duplicated conserved element containing an imperfect 39-bp palindrome located in the variable region, and divergent repetitive motifs of similar nucleotide composition in the repetitive region. Several variants of the CL Brener and Esmeraldo non-coding region were assembled, indicating that maxicircle sequence and size are heterogeneous in these strains. The size of the total assemblies differed between CL Brener and Esmeraldo strains due to differences in the repetitive area of the non-coding region; a strain-specific 236-nt deletion was found in the Esmeraldo coding region. These genomes represent two of the three T. cruzi maxicircle clades defined previously [38, 40].
The profound influence of RNA editing on maxicircle genes overshadows all other processes affecting mitochondrial genomes. The main evolutionary forces shaping the nucleotide composition and skew of metazoan mitochondrial genomes are postulated to be the asymmetric processes of DNA replication and RNA transcription, biasing nucleotide frequency due to differential mutation and selection pressures . While these processes may have some effect, RNA editing is the dominant influence affecting AT and GC skew on both strands of the coding region. Unlike metazoans, where all protein-coding genes are encoded on the same strand, maxicircle genes are transcribed as polycistronic transcripts from both strands [47, 53], thus the influence of coding bias and directional mutational bias from transcription would be negated.
Generally, protozoan mitochondrial genomes display lower %GC than mitochondrial genomes from metazoans, except for those of insects , with the trypanosomatid mitochondria showing among the lowest %GC of all protozoans (Table 2). The extreme AT-richness of trypanosomatid mitochondrial genomes is due to the repetitive portion of the non-coding region (Table 2) that accounts for approximately 15–18% of the T. cruzi maxicircle sequence and 17–30% of the entire maxicircle sequence in various T. brucei strains . AT skew is strong in the non-coding region of the maxicircle, but not in the coding region, while GC skew is inverted. Different directional biases affect mutation rates in these two regions, related to gene content issues in the coding region and to potential structural and functional constraints on the non-coding region for kinetoplast organization, replication and division.
The poly (A) tracts of the repetitive region may induce bending of the DNA over long stretches. DNA bending was first discovered in minicircles , and is found in multiple eukaryotic and prokaryotic DNAs, especially in control regions such as promoters  and origins of replication . In minicircles from many kinetoplastids, T. cruzi being an exception, bends are located adjacent to conserved sequence blocks that are likely binding sites for kinetoplast replication proteins . The bent region may serve a topological function for packing of the minicircles into the characteristic disc structure . Maxicircles in T. equiperdum are concatenated in a network independent of minicircles, linked together at a protease-resistant core . The bent nature of the repetitive region may form a solenoid structure that intertwines the maxicircles within this central core.
An intriguing conserved sequence element of ~300 bp is present in equivalent positions of the variable region of the T. cruzi CL Brener and Esmeraldo strains. A 39-bp imperfect palindrome within this element may serve as a specific recognition sequence for DNA binding proteins. Palindromes may serve as binding sites for dimeric or multimeric proteins at promoter regions and replication origins. Origins of replication have been mapped to the non-coding region of T. brucei and C. fasciculata. The T. brucei origin lies in the variable region just upstream of the 12S rRNA; two conserved 600-bp tandem repeats were found in the T. brucei maxicircle assembly [41, 50]. The T. cruzi and T. brucei elements show no conservation at the sequence level, however the position and tandem arrangements suggest a common function. In mammalian mitochondria transcription initiation and DNA replication begin at the same site just upstream of the rRNA genes . In T. brucei the 12S rRNA primary transcript initiates approximately 1.2 kb upstream of the gene , in the same region as the origin of replication. This variable region element may have a dual function as both replication origin and promoter.
The mixing of parental minicircles observed in T. brucei hybrids  may be a method of ensuring preservation of the specific templates for RNA editing that are carried in the gRNA genes, countering loss of gRNAs through random segregation of the minicircles to daughter cells . gRNAs from different strains might bear little resemblance to one-another at the sequence level, but their ability to direct the editing process may be indistinguishable relative to the mRNA. The physical mixing of the maxicircles during the fusion process may be inhibited by the structural organization of this relatively large molecule within the minicircle-clogged network. With the identification of multiple specific markers in the maxicircles of different strains, recombination or exchange during cellular fusion events may be detected. The spatial arrangement of the maxicircle within the kDNA network is not known, and the maxicircle genomes allow the design of specific probes to determine their organization within the kDNA disc.
Edited genes consistently show a lower percent identity than non-edited genes among the genera. This difference is minor between CL Brener and Esmeraldo, but increases in comparisons between the Trypanosoma species, and is greater still between the Trypanosoma spp. and L. tarentolae (Table 3). Editing thus corrects for genomic variability in the number and distribution of thymidine residues, allowing protein sequence conservation despite highly diverged genomic sequences. The ND7, COIII, and ATPase6 transcripts are extensively edited in T. brucei and T. cruzi, but only require 5' editing in L. tarentolae. ND8, ND9, ND3, CR3, CR4, and RPS12 are extensively edited in all three species but the pre-edited genes are shorter in T. brucei and T. cruzi than in L. tarentolae. As such Trypanosoma spp. maintain a smaller maxicircle coding region, but require more gRNA information than Leishmania. Trypanosoma branches early in the evolution of trypanosomatids, before the Leishmania clade , and in accordance with this evolutionary schema the greater number of extensively-edited genes in the Trypanosoma may more closely resemble the ancestral state of the Trypanosomatina. The loss of extensive RNA editing seen in Leishmania would represent a derived state [48, 57].
The pressure to maintain extensive amounts of RNA editing might be revealed by the consequences of the process. The size of the primary gene template encoded in the maxicircle is radically minimized, with up to half of a gene's coding information stored in gRNAs whose function is largely unaffected by transition mutations. Insertions and deletions of thymidines in the edited gene coding regions are also tolerated due to their subsequent reshuffling during RNA editing. RNA editing can compensate for certain DNA mutations and allow for production of wild-type functional proteins. RNA editing is likely to have evolved in a free-living ancestor of trypanosomatids that migrated between aerobic and anaerobic environments . Thus protein functionality may be maintained despite mutation over many generations in the absence of selection, while the cells live in an environment that relieves them of the necessity to perform oxidative phosphorylation or other aerobic mitochondrial activities.
The T. cruzi strain-specific indels of Esmeraldo and CL Brener in gene coding regions not edited in T. brucei may represent mutations that are tolerated in a cultured strain, but prohibitive to completion of the digenic lifecycle, as is seen in the degenerate UC strain of L. tarentolae. In short, they could represent laboratory-generated dead-ends. However, these mutations may give insight into evolution of the RNA editing process. The strong uridine bias evident in the non-edited genes suggests that all of the genes were at one time subject to extensive RNA editing. The persistence of 5'-end editing supports the theory that reverse transcriptase mediated recombination of partially edited intermediates eliminated the need for whole-ORF editing in several instances. Perhaps residual gRNAs for non-edited genes are maintained in the heterogeneous minicircle population due to the structure of the minicircles themselves in T. cruzi and T. brucei. While in L. tarentolae a single gRNA is contained on each minicircle, the Trypanosoma minicircles carry two to four genes. If maxicircle genes are under pressure to 'escape' from editing, the single-gRNA minicircles of Leishmania are more easily lost from the population than a gRNA in the triplet or quartet minicircle arrangements of Trypanosoma, effectively pushing Leishmania toward a more rapid loss of editing information. As such, the Trypanosoma spp. may carry an extra load in the form of gRNAs for genes that no longer require editing, but that could be utilized in the repair of indel mutations in normally unedited gene regions as a post-transcriptional proofreading mechanism. With the further characterization of the minicircle populations and RNA editing events of the CL Brener and Esmeraldo strains, these possibilities can be addressed directly.
The complete assembly of two T. cruzi maxicircle genomes was performed using data generated by genome sequencing projects for the CL Brener and Esmeraldo strains. These sequences represent two of the three maxicircle mitochondrial genome clades of T. cruzi. The coding region of both maxicircles shows conservation of gene content and order with other kinetoplastids, with similar editing patterns conserved within the Trypanosoma. Strain-specific indels may indicate unique editing events or non-functional genes. The non-coding regions share duplicated, conserved elements of approximately 325 bp situated upstream of the ribosomal RNA genes that may serve as origins of replication and transcription initiation sites. Repetitive portions of the non-coding region are distinct for each strain, but display similarly poly-A rich sequences that may induce DNA bending and serve a structural function.
The strategy employed in the construction of CL Brener and Esmeraldo maxicircles comprised the gathering and assembly of reads generated by the TIGR-SBRI-KI T. cruzi Sequencing Consortium (TSK-TSC)  with high identity as determined by BLAST to previously published maxicircle sequences: 5.7 Kb fragment of T. cruzi maxicircle, [GenBank:U43567], T. brucei maxicircle, [GenBank:M94286] and L. tarentolae maxicircle [GenBank:M10126]. Various assembly software packages were used in the assembly with equivalent results: Celera Assembler , Phrap  and SeqMan (Lazergene, DNAStar). All assembly software was executed using the default parameters. Additionally, any contigs generated by the T. cruzi genome consortium and having high identity to maxicircle sequences mentioned above were merged together. Other contigs were iteratively added to this initial sequence based on sequence identity and mate pair linkage. The iteration was finished when the generated sequence had similar patterns on its both ends, as evidence that a circular sequence was obtained. The sequence was dismantled and its contigs were submitted to scaffolding by Bambus software . The final maxicircle sequence was obtained by manual overlapping and concatenation of Bambus scaffolds. Both strategies generated identical results for the coding region sequence, but only the latter had produced a putative circular sequence. Consensus maxicircle sequences for T. cruzi strains CL Brener and Esmeraldo are available through [GenBank:DQ343645 and GenBank:DQ343646], respectively.
Manual annotation of gene coding regions was performed by comparison to the published T. brucei maxicircle sequence, [GenBank:M94286]; the L. tarentolae maxicircle sequence, [GenBank:M10126]; T. brucei maxicircle variable region, [GenBank:Z15118]; and further annotation of T. brucei and L. tarentolae by Larry Simpson available through his Uridine-Insertion/Deletion Edited Sequence Database .
Linear annotated maps of the maxicircle sequence and TA skew, GC skew and GC percentage graphs of the coding region were created using Artemis . Circular maps were generated using Bioedit . Dotplot graphs of T. cruzi sequence plotted against T. brucei and L. tarentolae showing exact matches of 10-bp wordsize were generated by Dottup application from the EMBOSS software suite . Dotplot graphs of CL Brener and Esmeraldo maxicircle variable regions plotted against themselves and each other were generated using the Dotter software package . The Dotter software package uses greyscale to display less perfect matches as lighter shaded dots, and also displays matches to the reverse complement.
Alignments of the coding regions of the maxicircle genes and translated protein products of T. cruzi, T. brucei, and L. tarentolae were performed using ClustalX. Alignments were adjusted manually using BioEdit . BioEdit was used to calculate nucleotide composition and percent identity matrices.
MEME/MAST software  was used to define and locate motifs within the repetitive region of the maxicircles. Multiple occurrences of each motif were aligned and submitted to Weblogo . This software generates a graphical representation of multiple alignments where the character height represents the degree of conservation of each monomer.
We thank all the collaborators on the T. cruzi genome sequencing project generating the dataset upon which this study was based; Larry Simpson for the generation and maintenance of the RNA editing website; Dan Ray, Robert Hitchcock, Sean Thomas, Jesse Zamudio and L.L. Isadora Trejo Martinez for helpful discussions and/or critical reading of the manuscript. Primary sequence data from the T. cruzi CL Brener strain and Esmeraldo strain were obtained from The Institute for Genomic Research website at . Sequencing of T. cruzi was funded by the National Institute of Allergy and Infectious Disease (NIAID). S.J.W. is a pre-doctoral trainee of the UCLA Bioinformatics Integrative Graduate Education and Research Traineeship program funded by NSF grant DGE9987641.
- Souza W: Novel Cell Biology of Trypanosoma cruzi. American Trypanosomiasis World Class Parasites: Volume 7. Edited by: Miles MATKM. 2003, Boston , Springer, 13-24.View ArticleGoogle Scholar
- Simpson L, Neckelmann N, de la Cruz VF, Simpson AM, Feagin JE, Jasmer DP, Stuart JE: Comparison of the maxicircle (mitochondrial) genomes of Leishmania tarentolae and Trypanosoma brucei at the level of nucleotide sequence. J Biol Chem. 1987, 262 (13): 6182-6196.PubMedGoogle Scholar
- Feagin JE, Shaw JM, Simpson L, Stuart K: Creation of AUG initiation codons by addition of uridines within cytochrome b transcripts of kinetoplastids. Proc Natl Acad Sci U S A. 1988, 85 (2): 539-543. 10.1073/pnas.85.2.539.PubMedPubMed CentralView ArticleGoogle Scholar
- Shaw JM, Feagin JE, Stuart K, Simpson L: Editing of kinetoplastid mitochondrial mRNAs by uridine addition and deletion generates conserved amino acid sequences and AUG initiation codons. Cell. 1988, 53 (3): 401-411. 10.1016/0092-8674(88)90160-2.PubMedView ArticleGoogle Scholar
- Payne M, Rothwell V, Jasmer DP, Feagin JE, Stuart K: Identification of mitochondrial genes in Trypanosoma brucei and homology to cytochrome c oxidase II in two different reading frames. Mol Biochem Parasitol. 1985, 15 (2): 159-170. 10.1016/0166-6851(85)90117-3.PubMedView ArticleGoogle Scholar
- Benne R, Van den Burg J, Brakenhoff JP, Sloof P, Van Boom JH, Tromp MC: Major transcript of the frameshifted coxII gene from trypanosome mitochondria contains four nucleotides that are not encoded in the DNA. Cell. 1986, 46 (6): 819-826. 10.1016/0092-8674(86)90063-2.PubMedView ArticleGoogle Scholar
- Shaw JM, Campbell D, Simpson L: Internal frameshifts within the mitochondrial genes for cytochrome oxidase subunit II and maxicircle unidentified reading frame 3 of Leishmania tarentolae are corrected by RNA editing: evidence for translation of the edited cytochrome oxidase subunit II mRNA. Proc Natl Acad Sci U S A. 1989, 86 (16): 6220-6224. 10.1073/pnas.86.16.6220.PubMedPubMed CentralView ArticleGoogle Scholar
- van der Spek H, van den Burg J, Croiset A, van den Broek M, Sloof P, Benne R: Transcripts from the frameshifted MURF3 gene from Crithidia fasciculata are edited by U insertion at multiple sites. EMBO J. 1988, 7 (8): 2509-2514.PubMedPubMed CentralGoogle Scholar
- Feagin JE, Abraham JM, Stuart K: Extensive editing of the cytochrome c oxidase III transcript in Trypanosoma brucei. Cell. 1988, 53 (3): 413-422. 10.1016/0092-8674(88)90161-4.PubMedView ArticleGoogle Scholar
- Simpson L, Sbicego S, Aphasizhev R: Uridine insertion/deletion RNA editing in trypanosome mitochondria: a complex business. RNA. 2003, 9 (3): 265-276. 10.1261/rna.2178403.PubMedPubMed CentralView ArticleGoogle Scholar
- Stuart K, Panigrahi AK: RNA editing: complexity and complications. Mol Microbiol. 2002, 45 (3): 591-596. 10.1046/j.1365-2958.2002.03028.x.PubMedView ArticleGoogle Scholar
- Uridine-Insertion/Deletion Edited Sequence Database. [http://dna.kdna.ucla.edu/trypanosome/database.html]
- Avila HA, Simpson L: Organization and complexity of minicircle-encoded guide RNAs in Trypanosoma cruzi. RNA. 1995, 1 (9): 939-947.PubMedPubMed CentralGoogle Scholar
- Avila H, Goncalves AM, Nehme NS, Morel CM, Simpson L: Schizodeme analysis of Trypanosoma cruzi stocks from South and Central America by analysis of PCR-amplified minicircle variable region sequences. Mol Biochem Parasitol. 1990, 42 (2): 175-187. 10.1016/0166-6851(90)90160-N.PubMedView ArticleGoogle Scholar
- Macina RA, Sanchez DO, Affranchino JL, Engel JC, Frasch AC: Polymorphisms within minicircle sequence classes in the kinetoplast DNA of Trypanosoma cruzi clones. Mol Biochem Parasitol. 1985, 16 (1): 61-74. 10.1016/0166-6851(85)90049-0.PubMedView ArticleGoogle Scholar
- Macina RA, Sanchez DO, Gluschankof DA, Burrone OR, Frasch AC: Sequence diversity in the kinetoplast DNA minicircles of Trypanosoma cruzi. Mol Biochem Parasitol. 1986, 21 (1): 25-32. 10.1016/0166-6851(86)90075-7.PubMedView ArticleGoogle Scholar
- Sanchez DO, Madrid R, Engel JC, Frasch AC: Rapid identification of Trypanosoma cruzi isolates by 'dot-spot' hybridization. FEBS Lett. 1984, 168 (1): 139-142. 10.1016/0014-5793(84)80223-9.PubMedView ArticleGoogle Scholar
- Junqueira AC, Degrave W, Brandao A: Minicircle organization and diversity in Trypanosoma cruzi populations. Trends Parasitol. 2005, 21 (6): 270-272. 10.1016/j.pt.2005.04.001.PubMedView ArticleGoogle Scholar
- Morel CM, Lazdins J: Chagas disease. Nat Rev Microbiol. 2003, 1 (1): 14-15. 10.1038/nrmicro735.PubMedView ArticleGoogle Scholar
- Campbell DA, Westenberger SJ, Sturm NR: The determinants of Chagas disease: connecting parasite and host genetics. Curr Mol Med. 2004, 4 (6): 549-562. 10.2174/1566524043360249.PubMedView ArticleGoogle Scholar
- Anonymous: Recommendations from a satellite meeting. Mem Inst Oswaldo Cruz. 1999, 94: 429-432.View ArticleGoogle Scholar
- Brisse S, Barnabé C, Tibayrenc M: Identification of six Trypanosoma cruzi phylogenetic lineages by random amplified polymorphic DNA and multilocus enzyme electrophoresis. Int J Parasitol. 2000, 30 (1): 35-44. 10.1016/S0020-7519(99)00168-X.PubMedView ArticleGoogle Scholar
- Brisse S, Dujardin JC, Tibayrenc M: Identification of six Trypanosoma cruzi lineages by sequence-characterised amplified region markers. Mol Biochem Parasitol. 2000, 111 (1): 95-105. 10.1016/S0166-6851(00)00302-9.PubMedView ArticleGoogle Scholar
- Brisse S, Verhoef J, Tibayrenc M: Characterisation of large and small subunit rRNA and mini-exon genes further supports the distinction of six Trypanosoma cruzi lineages. Int J Parasitol. 2001, 31 (11): 1218-1226. 10.1016/S0020-7519(01)00238-7.PubMedView ArticleGoogle Scholar
- Westenberger SJ, Barnabé C, Campbell DA, Sturm NR: Two hybridization events define the population structure of Trypanosoma cruzi. Genetics. 2005, 171 (2): 527-543. 10.1534/genetics.104.038745.PubMedPubMed CentralView ArticleGoogle Scholar
- Engman DM, Leon JS: Pathogenesis of Chagas heart disease: role of autoimmunity. Acta Trop. 2002, 81 (2): 123-132. 10.1016/S0001-706X(01)00202-9.PubMedView ArticleGoogle Scholar
- Leon JS, Engman DM: Autoimmunity in Chagas heart disease. Int J Parasitol. 2001, 31 (5-6): 555-561.PubMedView ArticleGoogle Scholar
- Tarleton RL, Zhang L: Chagas disease etiology: autoimmunity or parasite persistence?. Parasitol Today. 1999, 15 (3): 94-99. 10.1016/S0169-4758(99)01398-8.PubMedView ArticleGoogle Scholar
- Tarleton RL: Parasite persistence in the aetiology of Chagas disease. Int J Parasitol. 2001, 31 (5-6): 550-554.PubMedView ArticleGoogle Scholar
- Tarleton RL: Chagas disease: a role for autoimmunity?. Trends Parasitol. 2003, 19 (10): 447-451. 10.1016/j.pt.2003.08.008.PubMedView ArticleGoogle Scholar
- Leon JS, Engman DM: The significance of autoimmunity in the pathogenesis of Chagas heart disease. Front Biosci. 2003, 8: e315-22.PubMedView ArticleGoogle Scholar
- Iwai LK, Juliano MA, Juliano L, Kalil J, Cunha-Neto E: T-cell molecular mimicry in Chagas disease: identification and partial structural analysis of multiple cross-reactive epitopes between Trypanosoma cruzi B13 and cardiac myosin heavy chain. J Autoimmun. 2005, 24 (2): 111-117. 10.1016/j.jaut.2005.01.006.PubMedView ArticleGoogle Scholar
- Simoes-Barbosa A, Barros AM, Nitz N, Arganaraz ER, Teixeira AR: Integration of Trypanosoma cruzi kDNA minicircle sequence in the host genome may be associated with autoimmune serum factors in Chagas disease patients. Mem Inst Oswaldo Cruz. 1999, 94 (Suppl 1): 249-252.PubMedView ArticleGoogle Scholar
- Teixeira AR, Arganaraz ER, Freitas LHJ, Lacava ZG, Santana JM, Luna H: Possible integration of Trypanosoma cruzi kDNA minicircles into the host cell genome by infection. Mutat Res. 1994, 305 (2): 197-209.PubMedView ArticleGoogle Scholar
- Affranchino JL, Sanchez DO, Engel JC, Frasch AC, Stoppani AO: Trypanosoma cruzi: structure and transcription of kinetoplast DNA maxicircles of cloned stocks. J Protozool. 1986, 33 (4): 503-507.PubMedView ArticleGoogle Scholar
- Leon W, Frasch AC, Hoeijmakers JH, Fase-Fowler F, Borst P, Brunel F, Davison J: Maxi-circles and mini-circles in kinetoplast DNA from Trypanosoma cruzi. Biochim Biophys Acta. 1980, 607 (2): 221-231.PubMedView ArticleGoogle Scholar
- Ochs DE, Otsu K, Teixeira SM, Moser DR, Kirchhoff LV: Maxicircle genomic organization and editing of an ATPase subunit 6 RNA in Trypanosoma cruzi. Mol Biochem Parasitol. 1996, 76 (1-2): 267-278. 10.1016/0166-6851(95)02565-0.PubMedView ArticleGoogle Scholar
- Brisse S, Henriksson J, Barnabé C, Douzery EJ, Berkvens D, Serrano M, De Carvalho MR, Buck GA, Dujardin JC, Tibayrenc M: Evidence for genetic exchange and hybridization in Trypanosoma cruzi based on nucleotide sequences and molecular karyotype. Infect Genet Evol. 2003, 2 (3): 173-183. 10.1016/S1567-1348(02)00097-7.PubMedView ArticleGoogle Scholar
- Lake JA, de la Cruz VF, Ferreira PC, Morel C, Simpson L: Evolution of parasitism: kinetoplastid protozoan history reconstructed from mitochondrial rRNA gene sequences. Proc Natl Acad Sci U S A. 1988, 85 (13): 4779-4783. 10.1073/pnas.85.13.4779.PubMedPubMed CentralView ArticleGoogle Scholar
- Machado CA, Ayala FJ: Nucleotide sequences provide evidence of genetic exchange among distantly related lineages of Trypanosoma cruzi. Proc Natl Acad Sci U S A. 2001, 98 (13): 7396-7401. 10.1073/pnas.121187198.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaunt MW, Yeo M, Frame IA, Stothard JR, Carrasco HJ, Taylor MC, Mena SS, Veazey P, Miles GA, Acosta N, de Arias AR, Miles MA: Mechanism of genetic exchange in American trypanosomes. Nature. 2003, 421 (6926): 936-939. 10.1038/nature01438.PubMedView ArticleGoogle Scholar
- El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin G, Westenberger SJ, Caler E, Cerqueira GC, Branche C, Haas B, Anupama A, Arner E, Aslund L, Attipoe P, Bontempi E, Bringaud F, Burton P, Cadag E, Campbell DA, Carrington M, Crabtree J, Darban H, da Silveira JF, de Jong P, Edwards K, Englund PT, Fazelina G, Feldblyum T, Ferella M, Frasch AC, Gull K, Horn D, Hou L, Huang Y, Kindlund E, Klingbeil M, Kluge S, Koo H, Lacerda D, Levin MJ, Lorenzi H, Louie T, Machado CR, McCulloch R, McKenna A, Mizuno Y, Mottram JC, Nelson S, Ochaya S, Osoegawa K, Pai G, Parsons M, Pentony M, Pettersson U, Pop M, Ramirez JL, Rinta J, Robertson L, Salzberg SL, Sanchez DO, Seyler A, Sharma R, Shetty J, Simpson AJ, Sisk E, Tammi MT, Tarleton R, Teixeira S, Van Aken S, Vogt C, Ward PN, Wickstead B, Wortman J, White O, Fraser CM, Stuart KD, Andersson B: The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005, 309 (5733): 409-415. 10.1126/science.1112631.PubMedView ArticleGoogle Scholar
- Thomas S, Westenberger SJ, Campbell DA, Sturm NR: Intragenomic spliced leader RNA array analysis of kinetoplastids reveals unexpected transcribed region diversity in Trypanosoma cruzi. Gene. 2005, 352: 100-108. 10.1016/j.gene.2005.04.002.PubMedView ArticleGoogle Scholar
- Thiemann OH, Maslov DA, Simpson L: Disruption of RNA editing in Leishmania tarentolae by the loss of minicircle-encoded guide RNA genes. EMBO J. 1994, 13 (23): 5689-5700.PubMedPubMed CentralGoogle Scholar
- Shioiri C, Takahata N: Skew of mononucleotide frequencies, relative abundance of dinucleotides, and DNA strand asymmetry. J Mol Evol. 2001, 53 (4-5): 364-376. 10.1007/s002390010226.PubMedView ArticleGoogle Scholar
- Simpson L, Maslov DA: Evolution of the U-insertion/deletion RNA editing in mitochondria of kinetoplastid protozoa. Ann N Y Acad Sci. 1999, 870: 190-205. 10.1111/j.1749-6632.1999.tb08879.x.PubMedView ArticleGoogle Scholar
- Feagin JE, Jasmer DP, Stuart K: Apocytochrome b and other mitochondrial DNA sequences are differentially expressed during the life cycle of Trypanosoma brucei. Nucleic Acids Res. 1985, 13 (12): 4577-4596.PubMedPubMed CentralView ArticleGoogle Scholar
- Maslov DA, Avila HA, Lake JA, Simpson L: Evolution of RNA editing in kinetoplastid protozoa. Nature. 1994, 368 (6469): 345-348. 10.1038/368345a0.PubMedView ArticleGoogle Scholar
- Landweber LF, Gilbert W: RNA editing as a source of genetic variation. Nature. 1993, 363 (6425): 179-182. 10.1038/363179a0.PubMedView ArticleGoogle Scholar
- Myler PJ, Glick D, Feagin JE, Morales TH, Stuart KD: Structural organization of the maxicircle variable region of Trypanosoma brucei: identification of potential replication origins and topoisomerase II binding sites. Nucleic Acids Res. 1993, 21 (3): 687-694.PubMedPubMed CentralView ArticleGoogle Scholar
- Swiss EMBnet node server. [http://www.ch.EMBnet.org]
- Francino MP, Ochman H: Strand asymmetries in DNA evolution. Trends Genet. 1997, 13 (6): 240-245. 10.1016/S0168-9525(97)01118-9.PubMedView ArticleGoogle Scholar
- Read LK, Myler PJ, Stuart K: Extensive editing of both processed and preprocessed maxicircle CR6 transcripts in Trypanosoma brucei. J Biol Chem. 1992, 267 (2): 1123-1128.PubMedGoogle Scholar
- Marini JC, Levene SD, Crothers DM, Englund PT: A bent helix in kinetoplast DNA. Cold Spring Harb Symp Quant Biol. 1983, 47 Pt 1: 279-283.PubMedView ArticleGoogle Scholar
- Wu HM, Crothers DM: The locus of sequence-directed and protein-induced DNA bending. Nature. 1984, 308 (5959): 509-513. 10.1038/308509a0.PubMedView ArticleGoogle Scholar
- Ntambi JM, Marini JC, Bangs JD, Hajduk SL, Jimenez HE, Kitchin PA, Klein VA, Ryan KA, Englund PT: Presence of a bent helix in fragments of kinetoplast DNA minicircles from several trypanosomatid species. Mol Biochem Parasitol. 1984, 12 (3): 273-286. 10.1016/0166-6851(84)90084-7.PubMedView ArticleGoogle Scholar
- Arts GJ, Benne R: Mechanism and evolution of RNA editing in kinetoplastida. Biochim Biophys Acta. 1996, 1307 (1): 39-54.PubMedView ArticleGoogle Scholar
- Ryan KA, Shapiro TA, Rauch CA, Englund PT: Replication of kinetoplast DNA in trypanosomes. Annu Rev Microbiol. 1988, 42: 339-358. 10.1146/annurev.mi.42.100188.002011.PubMedView ArticleGoogle Scholar
- Shapiro TA: Kinetoplast DNA maxicircles: networks within networks. Proc Natl Acad Sci U S A. 1993, 90 (16): 7809-7813. 10.1073/pnas.90.16.7809.PubMedPubMed CentralView ArticleGoogle Scholar
- Gibson W, Crow M, Kearns J: Kinetoplast DNA minicircles are inherited from both parents in genetic crosses of Trypanosoma brucei. Parasitol Res. 1997, 83 (5): 483-488. 10.1007/s004360050284.PubMedView ArticleGoogle Scholar
- Savill NJ, Higgs PG: A theoretical study of random segregation of minicircles in trypanosomatids. Proc Biol Sci. 1999, 266 (1419): 611-620. 10.1098/rspb.1999.0680.PubMedPubMed CentralView ArticleGoogle Scholar
- Hughes AL, Piontkivska H: Phylogeny of Trypanosomatidae and Bodonidae (Kinetoplastida) Based on 18S rRNA: Evidence for Paraphyly of Trypanosoma and Six Other Genera. Mol Biol Evol. 2003, 20 (4): 644-652. 10.1093/molbev/msg062.PubMedView ArticleGoogle Scholar
- Cavalier-Smith T: Cell and genome coevolution: facultative anaerobiosis, glycosomes and kinetoplastan RNA editing. Trends Genet. 1997, 13 (1): 6-9. 10.1016/S0168-9525(96)30116-9.PubMedView ArticleGoogle Scholar
- Savill NJ, Higgs PG: Redundant and non-functional guide RNA genes in Trypanosoma brucei are a consequence of multiple genes per minicircle. Gene. 2000, 256 (1-2): 245-252. 10.1016/S0378-1119(00)00345-0.PubMedView ArticleGoogle Scholar
- Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC: A whole-genome assembly of Drosophila. Science. 2000, 287 (5461): 2196-2204. 10.1126/science.287.5461.2196.PubMedView ArticleGoogle Scholar
- Laboratory of Phil Green website.
- Pop M, Kosack DS, Salzberg SL: Hierarchical scaffolding with Bambus. Genome Res. 2004, 14 (1): 149-159. 10.1101/gr.1536204.PubMedPubMed CentralView ArticleGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.PubMedView ArticleGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999, 41: 95-98.Google Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView ArticleGoogle Scholar
- Sonnhammer EL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995, 167 (1-2): GC1-10. 10.1016/0378-1119(95)00714-8.PubMedView ArticleGoogle Scholar
- THE MEME / MAST SYSTEM Motif Discovery and Search Version 3.5.2. [http://meme.sdsc.edu/meme/intro.html]
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14 (6): 1188-1190. 10.1101/gr.849004.PubMedPubMed CentralView ArticleGoogle Scholar
- The Institute for Genomic Research website. [http://www.tigr.org]