Complete mitochondrial genome sequence of Urechis caupo, a representative of the phylum Echiura
© Boore; licensee BioMed Central Ltd. 2004
Received: 10 February 2004
Accepted: 15 September 2004
Published: 15 September 2004
Mitochondria contain small genomes that are physically separate from those of nuclei. Their comparison serves as a model system for understanding the processes of genome evolution. Although hundreds of these genome sequences have been reported, the taxonomic sampling is highly biased toward vertebrates and arthropods, with many whole phyla remaining unstudied. This is the first description of a complete mitochondrial genome sequence of a representative of the phylum Echiura, that of the fat innkeeper worm, Urechis caupo.
This mtDNA is 15,113 nts in length and 62% A+T. It contains the 37 genes that are typical for animal mtDNAs in an arrangement somewhat similar to that of annelid worms. All genes are encoded by the same DNA strand which is rich in A and C relative to the opposite strand. Codons ending with the dinucleotide GG are more frequent than would be expected from apparent mutational biases. The largest non-coding region is only 282 nts long, is 71% A+T, and has potential for secondary structures.
Urechis caupo mtDNA shares many features with those of the few studied annelids, including the common usage of ATG start codons, unusual among animal mtDNAs, as well as gene arrangements, tRNA structures, and codon usage biases.
KeywordsmtDNA evolution gene rearrangement annelid strand skew
Mitochondrial genomes are physically separate from the nuclear genome. For animals, they are typically circular with a compact arrangement of an identical set of 37 genes . For some animals, all genes are on the same strand, whereas for others they are divided between the two. The arrangement of these genes can remain stable for long periods of time; for example, human  and shark  mtDNAs have the same gene arrangement, and do those of fruit fly  and shrimp . However, in other lineages, rearrangements have occurred much more rapidly. Many of the same processes that occur in large and complex nuclear genomes also take place in these diminutive genomes, so comparisons among mtDNAs can address general questions of genome evolution, but in a model system that is much more tractable for a large number of taxa.
Toward this end, this article describes the complete mtDNA sequence of the fat innkeeper worm, Urechis caupo, the first example from the phylum Echiura. Echiurans comprise about 150 species and are commonly called spoon worms because of the shape of their extensible proboscis. Unlike annelids, they have no overt segmentation, but they develop via trochophore larvae, very similar to those of annelids. U. caupo is a pink, sausage shaped worm that lives in U-shaped burrows in the mud or sand of the intertidal or subtidal zones. Unlike other echiurans, it feeds on plankton by filtering using an elaborate mucus net.
Results and discussion
Gene content and organization
Mitochondrial gene arrangement identities found in pairwise comparisons between Urechis caupo and various animals. Full taxon names are given here for the annelids Lumbricus terrestris and Platynereis dumerilii, the mollusks Katharina tunicata, Loligo bleekeri, Cepaea nemoralis, and Mytilus edulis, the brachiopods Terebratulina retusa and Terebratalia transversa, the platyhelminths Fasciola hepatica, Taenia crassiceps, Echinococcus multilocularis, and Hymenolepis diminuta, the arthropods Drosophila yakuba, Anopheles gambiae, Artemia franciscana, Daphnia pulex, Apis mellifera, Locusta migratoria, Ixodes hexagonus, Rhiphicephalus sanguineus, Limulus polyphemus and Lithobius forficatus, the nematodes Trichinella spirallis, Onchocerca volvulus, Meloidogyne javanica, Ascaris suum, and Caenorhabditis elegans, the echinoderms Arabacia lixula, Asterina pectinifera, Paracentrotus lividus, Strongylocentrotus purpuratus, and Florometra serratissima, the hemichordate Balanoglossus carnosus, and the chordate Branchiostoma floridae along with the gene order most typical for vertebrates. Complete citations can be found in Boore (1999) or updated by following the "Evolutionary Genomics" link at http://www.jgi.doe.gov/. Contiguous gene arrangements are separated by a comma; a slash indicates a gap containing one or more unrelated genes.
L. terrestris and P. dumerilii
cox3, trnQ, nad6, cob, trnW, atp6, trnR, trnH, nad5, trnF/trnL2, nad1, trnI, trnK, nad3/trnT, nad4L, nad4
L. terrestris but not P. dumerilii
trnL1, trnA, trnS2, trnL2/trnD, atp8
trnL2, nad1/nad4L, nad4/trnH, nad5, trnF
nad6, cob/nad4L, nad4/nad5, trnF/trnD, atp8
trnL2, nad1/trnT, nad4L
trnL2, nad1/nad4L, nad4/trnH, nad5, trnF/trnL1, trnA/trnD, atp8/cox1, cox2
F. hepatica, T. crassiceps, E. multilocularis
trnI, trnK, nad3/nad4L, nad4/trnY, trnL1/trnS2, trnL2
trnI, trnK, nad3/nad4L, nad4/trnY, trnL1
D. yakuba, A. gambiae, A. franciscana, D. pulex
nad6, cob/nad4L, nad4/trnH, nad5, trnF/trnD, atp8
A. mellifera, L. migratoria
nad6, cob/nad4L, nad4/trnH, nad5, trnF
I. hexagonus, R. sanguineus, L. polyphemus, L. forficatus
nad6, cob/trnL2, nad1/nad4L, nad4/trnH, nad5, trnF/trnD, atp8/cox1, cox2
nad6, cob/nad4L, nad4/trnH, nad5, trnF/trnR, trnH/trnL1, trnA/trnD, atp8/cox1, cox2
A. lixula, A. pectinifera, P. lividus and S. purpuratus
trnL2, nad1, trnI
nad1, trnI/nad2, trnY
trnL2, nad1/nad4L, nad4
B. floridae and the typical vertebrate arrangement
trnL2, nad1, trnI/nad4L, nad4
Base composition and codon usage
Codon usage in the 13 protein-encoding genes of the Urechis caupo mitochondrial genome. The total number of codons is 3722. The anticodon of the corresponding tRNA gene is shown in parentheses below each amino acid designation. Stop codons are not included in this analysis.
Gene initiation and termination
Mitochondrial genes commonly use several alternatives to ATG as start codons. However, 11 of the 13 proteins coding genes of U. caupo mtDNA use ATG. The only exceptions are cox1, which uses GTG and nad3 which uses ATC. In the case of cox1, there is an in frame stop only three codons upstream and neither of the intervening codons is ATG. Also, this inference of starting on GTG specifies a set of amino acids well matched to those at the beginning of other Cox1 proteins. The situation for nad3 is nearly identical, with an in frame stop only four codons upstream and no intervening ATG codons. However, downstream of the inferred start are several ATA codons that can not be ruled out as alternatives. The commonality of using ATG as a start codon has also been noted for mitochondrial genes of four annelids, Platynereis dumerilii , Lumbricus terrestris , Helobdella robusta and Galathealinum brachiosum (previously considered to be of the phylum Pogonophora)  and a sipunculid, Phascolopsis gouldii .
As has been the case for all studied animal mtDNAs to date, two rRNA genes are identified, one for each of the small and large mitochondrial ribosomal subunits. Determining the precise ends of the rRNA transcript requires experimentation, but if it's assumed that they extend to the boundaries of the adjacent genes, then rrnS is 903 nucleotides and rrnL is 1266 nucleotides in length. These genes are arranged sequentially, but without an intervening tRNA gene as is otherwise commonly found.
The largest non-coding region is only 282 nts long. The region is 71% A+T and contains one palindrome of an 11 nt sequence (TCAAAAGGGGT/ACCCCTTTTGA, with a slash indicating the center), but otherwise no large repeat elements. Obviously, this has potential for forming a stem-loop structure, and it may be significant that a short sequence a few nucleotides upstream, TCAAAA, has the potential for competing with this to form a short hairpin with the TTTTGA at the end of the palindrome. There has been previous speculation that regions with potential for competing, mutually exclusive hairpins may play a role in regulating transcription and/or replication [e.g. ref. ]. There are four other potential hairpins in this region with stems of 5–6 bp and loops of 5–17 nt. All four nucleotides occur in homopolymers with much greater frequency than expected by chance, often in runs of four or five. The second largest non-coding region is 43 nt between trnS1 and cox3. This has no repeat elements and the base composition is unremarkable. What role, if any, these sequences have in the regulation of transcription and/or replication awaits further study.
Aside from these 282 and 43 nt regions, there are only 36 total intergenic nucleotides scattered among 14 regions. In seven cases these are 2–6 nts long (CCAAA, AT, TCCC, TAAA, CATAAA, AT, and ACACCT). For the other seven cases, genes are separated by a single nucleotide, and in six of these, that nucleotide is a C. (The remaining case is a T.) The prevalence of C is consistent with the measured G-skew between the strands, although it is possible that this otherwise indicates some function of these nucleotides.
This is the first description of a complete mitochondrial genome sequence of a representative of the phylum Echiura. The genome contains the same 37 genes most commonly found in animal mtDNAs. Many features are most similar to those found for annelid mtDNAs, including A+T content, use of protein initiation codons, size and potential secondary structures of the largest non-coding region, and the relative arrangement of many genes. As in annelids examined to date, all genes are found on the same DNA strand. As noted for brachiopod mtDNA, there is a preference for G nucleotides to appear in tandem, without obvious explanation. Further description and comparison of complete mtDNA sequences will continue to produce a picture of genome evolution, particularly once sampling includes representatives of each animal phylum.
A preparation of Urechis caupo total DNA was the kind gift of Eric Rosenthal. The entire mtDNA sequence was obtained using techniques detailed in . Briefly, small fragments (450–710 nt) were amplified from cox1, cob, and rrnS using primer pairs HCO 2198/LCO 1490 , CytbF/CytbR , and 16SARL/16SBRH , respectively. The sequences of these fragments were determined using dye-terminator chemistry (PE Biosystems) on an ABI 377 automated DNA sequencer. Primers were then designed facing "out" from these fragments to amplify the intervening regions (~2.9 to ~8 Kb) using long-PCR protocols with rTth-XL polymerase (PE Biosystems) as in . Sequences were determined from the ends of these long-PCR fragments, then internally by "primer walking". To ensure quality, all sequences were determined on both strands and base calls for all chromatograms were verified by eye.
Genes encoding rRNAs and proteins were identified by matching nucleotide or inferred amino acid sequences to those of Lumbricus terrestris mtDNA . Since it is not possible to precisely determine the ends of rRNA genes by sequence data alone, they were assumed to extend to the boundaries of flanking genes. Each protein gene start was inferred as the eligible initiation codon nearest to the beginning of its alignment with homologous genes that does not cause overlap with the preceding gene. In five cases, an abbreviated stop codon was inferred where cleavage of a downstream tRNA from the transcript would leave a partial codon of T or TA, such that subsequent mRNA polyadenylation could generate a TAA stop codon. In each case an extension of this gene to the first in frame stop codon would cause overlap with the downstream tRNA. Genes for tRNAs were identified generically by their ability to fold into a cloverleaf structure and specifically by anticodon sequence.
- cox1 :
cox2, cox3, cytochrome oxidase subunit I, II, and III protein genes
- cob :
cytochrome b gene
- atp6 :
atp8, ATP synthase subunit 6 and 8 genes
- nad1 :
nad2, nad3, nad4, nad4L, nad5, nad6, NADH dehydrogenase subunit 1–6, 4L genes
- trnA :
trnC, trnD, trnE, trnF, trnG, trnH, trnI, trnK, trnL1, trnL2, trnM, trnN, trnP, trnQ, trnR, trnS1, trnS2, trnT, trnV, trnW, trnY, transfer RNA genes designated by the one-letter code for the specified amino acid, with numerals differentiating cases where there are two tRNAs for the same amino acid.
I am grateful to Eric Rosenthal for Urechis caupo DNA. This work was supported by DEB-9807100 from the National Science Foundation and by contract No. DE-AC03-76SF00098 between the U.S. Department of Energy Office of Biological and Environmental Science, and the University of California, Lawrence Berkeley National Laboratory.
- Boore JL: Animal mitochondrial genomes. Nucleic Acids Res. 1999, 27: 1767-1780. 10.1093/nar/27.8.1767.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson S, Bankier AT, Barrell BG, DeBruijn MHL, Coulson AR, Drouin J, Eperon IC, Nieflich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG: Sequence and organization of the human mitochondrial genome. Nature. 1981, 290: 457-465.View ArticlePubMedGoogle Scholar
- Cao Y, Waddell PJ, Okada N, Hasegawa M: The complete mitochondrial DNA sequence of the shark (Mustelus manazo): Evaluating rooting contradictions to living bony vertebrates. Mol Biol Evol. 1998, 15 (12): 1637-1646.View ArticlePubMedGoogle Scholar
- Clary DO, Wolstenholme DR: The mitochondrial DNA molecule of Drosophila yakuba: Nucleotide sequence, gene organization, and genetic code. J Mol Evol. 1985, 22: 252-271.View ArticlePubMedGoogle Scholar
- Wilson K, Cahill V, Ballment E, Benzie J: The complete sequence of the mitochondrial genome of the crustacean Penaeus monodon: Are malacostracan crustaceans more closely related to insects than to branchiopods?. Mol Biol Evol. 2000, 17 (6): 863-874.View ArticlePubMedGoogle Scholar
- Boore JL: Complete mitochondrial genome sequence of the polychaete annelid Platynereis dumerilii. Mol Biol Evol. 2001, 18 (7): 1413-1416.View ArticlePubMedGoogle Scholar
- Boore JL, Brown WM: Complete DNA sequence of the mitochondrial genome of the annelid worm, Lumbricus terrestris. Genetics. 1995, 141: 305-319.PubMed CentralPubMedGoogle Scholar
- Perna NT, Kocher TD: Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes. J Mol Evol. 1995, 41: 353-358. 10.1007/BF00186547.View ArticlePubMedGoogle Scholar
- Helfenbein KG, Brown WM, Boore JL: The complete mitochondrial genome of a lophophorate, the brachiopod Terebratalia transversa. Mol Biol Evol. 2001, 18 (9): 1734-1744.View ArticlePubMedGoogle Scholar
- Boore JL, Brown WM: Mitochondrial genomes of Galathealinum, Helobdella, and Platynereis: Sequence and gene arrangement comparisons indicate that Pogonophora is not a phylum and Annelida and Arthropoda are not sister taxa. Mol Biol Evol. 2000, 17 (1): 87-106.View ArticlePubMedGoogle Scholar
- Boore JL, Staton J: The mitochondrial genome of the sipunculid Phascolopsis gouldii supports its association with Annelida rather than Mollusca. Mol Biol Evol. 2002, 19 (2): 127-137.View ArticlePubMedGoogle Scholar
- Folmer O, Black M, Hoeh R, Lutz R, Vrijenhoek R: DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol. 1994, 3: 294-299.PubMedGoogle Scholar
- Palumbi SR: Nucleic acids II: The polymerase chain reaction. In: Molecular Systematics. Edited by: Hillis DM, Moritz C, Mable BK. 1996, Sinauer Associates, Sunderland, Massachusetts, USA, 205-247.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.