The complete mitochondrial genome of the stomatopod crustacean Squilla mantis

Background Animal mitochondrial genomes are physically separate from the much larger nuclear genomes and have proven useful both for phylogenetic studies and for understanding genome evolution. Within the phylum Arthropoda the subphylum Crustacea includes over 50,000 named species with immense variation in body plans and habitats, yet only 23 complete mitochondrial genomes are available from this subphylum. Results I describe here the complete mitochondrial genome of the crustacean Squilla mantis (Crustacea: Malacostraca: Stomatopoda). This 15994-nucleotide genome, the first described from a hoplocarid, contains the standard complement of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a non-coding AT-rich region that is found in most other metazoans. The gene order is identical to that considered ancestral for hexapods and crustaceans. The 70% AT base composition is within the range described for other arthropods. A single unusual feature of the genome is a 230 nucleotide non-coding region between a serine transfer RNA and the nad1 gene, which has no apparent function. I also compare gene order, nucleotide composition, and codon usage of the S. mantis genome and eight other malacostracan crustaceans. A translocation of the histidine transfer RNA gene is shared by three taxa in the order Decapoda, infraorder Brachyura; Callinectes sapidus, Portunus trituberculatus and Pseudocarcinus gigas. This translocation may be diagnostic for the Brachyura. For all nine taxa nucleotide composition is biased towards AT-richness, as expected for arthropods, and is within the range reported for other arthropods. Codon usage is biased, and much of this bias is probably due to the skew in nucleotide composition towards AT-richness. Conclusion The mitochondrial genome of Squilla mantis contains one unusual feature, a 230 base pair non-coding region has so far not been described in any other malacostracan. Comparisons with other Malacostraca show that all nine genomes, like most other mitochondrial genomes, share a bias toward AT-richness and a related bias in codon usage. The nine malacostracans included in this analysis are not representative of the diversity of the class Malacostraca, and additional malacostracan sequences would surely reveal other unusual genomic features that could be useful in understanding mitochondrial evolution in this taxon.


Background
The mitochondria are extranuclear organelles present in all metazoans. They contain a circular genome, usually around 16 kilobases in length, with 37 genes (13 proteincoding, two ribosomal RNA genes, and 22 transfer RNA genes). This gene content is widely conserved, but gene order and the DNA sequences of the genes themselves are variable. Because of their small size many more mitochondrial genomes than nuclear genomes have been sequenced, and comparisons among them may serve as models for the evolution of the much larger nuclear genomes [1]. In addition, gene order rearrangements and mitochondrial gene sequences have been widely used for phylogenetic inference [2][3][4][5][6][7].
At present there are about 650 complete mitochondrial genomes available in public databases. Of these, about 75 percent are of vertebrates. By contrast only 129 complete mitochondrial genomes are available from arthropods, which are the most diverse and speciose phylum of animals. In addition, there is considerable taxonomic bias among the available arthropod sequences; 86, (67 percent) are hexapods. The subphylum Crustacea includes over 50,000 named species and is ecologically and morphologically the most diverse of the arthropod groups, and therefore of all the animals. Crustaceans occupy marine, terrestrial, and fresh water habitats from the deep sea to high mountains; range in adult size from less than one millimeter to more than four meters (leg span); and exhibit extensive variability in body plans when compared to other arthropod groups [8]. At present there are 23 complete crustacean mitochondrial genomes available. Within the Crustacea members of the class Malacostraca, which include crabs, lobsters, and shrimp, are perhaps the most well known to non-scientists. Due to their economic importance this group is often the focus of scientific enquiry. At present there are nine complete malacostracan mitochondrial genomes available, including that of the stomatopod shrimp Squilla mantis. In this paper I describe this genome and compare it to eight other mitochondrial genomes that are available from other Malacostraca.
Mantid shrimps, or stomatopods, are benthic predators distributed in the shallow waters of tropical and subtropical seas. They are best known for their raptorial appendages -pointed or clubbed -which they use to make lightning-fast attacks that disable prey animals by spearing or blunt trauma. Large individuals with the club-type appendages have been known to shatter the sides of aquaria [9]. Squilla mantis (Linnaeus, 1758) (Crustacea: Malacostraca: Stomatopoda), with a maximum length of around 20 cm, is distributed in shallow waters throughout the Mediterranean Sea and Eastern Atlantic [10]. S. mantis is widely consumed by humans throughout its range; UN Food and Agriculture Administration statistics indicate that total catches in the Mediterranean are currently in excess of 6500 tonnes per annum so this species is of some commercial importance [11].

Mitochondrial genome composition
The mitochondrial genome of Squilla mantis (GenBank accession number AY639936) is a circular molecule of 15994 nucleotides that contains the same 13 protein-coding genes, 22 transfer RNA genes (tRNA), and two ribosomal RNA genes (rRNA) found in other metazoans [12,13]; the majority strand (i.e., the strand encoding the majority of genes) encodes nine protein-coding genes and 14 tRNAs while the minority strand encodes four proteincoding genes, eight tRNAs and both rRNA genes (Table 1).
Squilla mantis mitochondrial tRNA genes folded into inferred cloverleaf structures Figure 1 Squilla mantis mitochondrial tRNA genes folded into inferred cloverleaf structures. A  A-T  G+T  G-C  A-T  T-A  A-T  A-T  T  A  ACTG   TGAT   ---+   A   A  A  A  A  C-G  C-G  T-A  G+T  A-T   CAAT   GTTA   ----A   G  G   T   T  T  TGC   T  A   TG   CT   A  A   Arginine  (R)   T  A-T  G-C  G-C  T-A  A-T  A-T  A-T  A  A  AGCG   TTGC   ---+   A   A  C  T  A  C-G  C-G  A-T  G-C   CCAC   GGTG   ----T  T  T  T  T   AA   TT   A  A   T  T  TCG   C  A   G  T-A  T-A  A-T  A-T  C-G  C-G  A-T   CCA   GGT   ---A  A  A  A A  G  T T A  G  A  T-A  A-T  T-A  T-A  A-T  T  T  GTT   A  A   CTTTT   GAAAG   ----T   +  T T  T  T  T   G A A   A   Asparagine  (N)   Aspartate  (D)   G  A-T  A-T  A-T  A-T  A-T  A-T  T-A  T  A  ATTG   TAAC   ----A   A  T  A  A  A TA   T-A  A-T  A-T  G-C  T-A   TGATT   ACTAA   -----A  A  A   T  T  A  T  A  A   T  T  GTC   A  A   Cysteine  (C)   T  A-T  G-C  T-A  T+G  T-A  C-G  A-T  T  A  TTTG   AAAT   ---A  A  T  T A  A  T

Alanine (A)
The S. mantis genome, like that reported for other arthropod genomes, is AT-rich and has an overall AT content of 70%. This frequency, as expected, varies for different regions of the genome. First and second codon positions average 62% and 63% AT, respectively, tRNA and rRNA genes average 72%, third codon positions average 79%, and putative non-coding regions reach up to 87% AT content ( Table 2). There are no significant differ-ences in AT frequency for genes encoded on the majority or minority strands. These values are within the range of 60-87% reported for other arthropods and are not unusual [14,15].

Gene structure
The predicted structures of the 22 S. mantis tRNAs are shown in Figure 1. Twenty one of these genes were identi-  fied by tRNAscan-SE [16] and have secondary structures similar to those of other published metazoan tRNA genes. Two genes, trnS 1 and trnQ, have single T-T mismatches in the acceptor stem, and one gene, trnM, has a single C-A mismatch in the stem of the TψC loop. The trnS 1 gene was not identified by the tRNAscan software; rather, it was located by its conserved position in the genome. The variable loop of this gene, with nine nucleotides, is longer than the average of four or five for mitochondrial tRNA genes. This feature is characteristic of type 2 transfer RNA genes, which are uncommon in animal mitochondria but are the norm for bacterial and eukaryotic trnS genes.
The large and small subunit ribosomal RNA (rRNA) genes (rnl, rns) have an AT content of 67%, within the range reported for other arthropod ribosomal RNA genes. Alignments of these genes with other arthropod homologues (not shown) as expected show both conserved and unconserved regions that correspond with the putative stems and loops within these genes. There are thus no unusual features to report for the two rRNA genes. All of the 13 protein-coding genes, except cox1, have putative ATR methionine or ATT isoleucine start codons. The putative first codon of the cox1 gene is ACG threonine. The lack of a standard initiation codon in cox1 genes is common in arthropod mitochondria, so S. mantis is not unusual [17,18]. Two of the protein-coding genes, cox1 and nad6, lack a full TAA or TAG stop codon. These genes appear to terminate with a single T from which a stop codon is created by polyadenylation of the mRNA during processing. Again, this phenomenon has been observed in other arthropod mitochondrial genomes and is not unusual [17,19,20].

AT-rich regions
Arthropod mitochondrial genomes typically have a long region that has an AT content higher than that of mitochondrial coding regions. This AT-rich region, ranging from 263 to 4601 base pairs in length and usually located between the rns and trnI genes, is often termed the control region because it contains a number of regulatory elements including the origin of replication for the heavy strand of the mitochondrial genome [21,22]. In some arthropods the AT-rich region is reported to have any or all of these four different motifs: tandemly repeated sequences, a long sequence of T's, a subregion of even higher AT richness, and stem-loop structures [23,24].
In S. mantis there are two AT-rich regions, numbered 1 and 2 on Tables 2 and 3. AT-rich region 2 corresponds to the conventional arthropod region between rns and trnI; it is 862 base pairs long, well within the reported range for other arthropods, with an AT content of 77% compared to 70% for the entire S. mantis mitochondrion. However, this region has none of the four motif types that have been reported for arthropods, and I was not able to identify any putatively functional motifs.
I therefore examined the shorter AT-rich region of 230 nucleotides between the trnS 2 and nad1 genes for possible functional motifs. Most arthropod mitochondrial genomes have a few short non-coding regions between some genes, usually from a few bases to 20 bases long, but longer non-coding regions, such as AT-rich region 1 in S. mantis, are rare. It therefore seemed possible that this region might have taken over some of the functions putatively assigned to the longer AT-rich region. However, ATrich region 1, like region 2, contains none of the motifs listed above. Furthermore, the AT content of this region, at 87%, is similar to that calculated collectively for other unassigned nucleotides in the S. mantis genome ( Table 2) and is consistent with the hypothesis that this region has no function.
Unusual genomic features, such as this non-coding region or gene order rearrangements, can be useful as characters for reconstructing evolutionary relationships [13,25]. A second AT-rich region is notably absent even in Harpiosquilla harpax, which is also a member of the family Squillidae. A survey of other members of the genus Squilla for the presence of a similar region would perhaps enhance our understanding of the history of this unusual genomic feature.

Comparison with other malacostracan crustaceans
A number of features of mitochondrial genomes can be used to infer relationships among taxa. These include phylogenetic analysis using DNA and protein sequences, relative rates of sequence evolution, gene order, and the effective number of codons (N c ). I present a phylogenetic analysis and a discussion of rates of sequence evolution in arthropods (including S. mantis) elsewhere [26], and discuss gene order and N c below.
Rearrangements of the mitochondrial genome are relatively rare events in evolutionary history. Such rare events can be used to infer relationships among taxa, and mitochondrial gene order rearrangements have proven useful in understanding some aspects of arthropod evolution [27][28][29]. Figure 2 shows the mitochondrial gene order for the nine Malacostraca for which there are complete mitochondrial genomes. Five of these genomes share the gene order considered ancestral for the Pancrustacea (Crustacea + Hexapoda) [27]. Callinectes sapidus, Portunus trituberculatus and Pseudocarcinus gigas share a single translocation of trnH compared to the ancestral gene order. The mitochondrial genome of Cherax destructor is considerably rearranged and evidences at least seven translocation events compared to the ancestral pancrustacean arrangement [20]. C. sapidus, P. trituberculatus and P. gigas are all decapods within the infraorder Brachyura (Table 1). The trnH translocation shared by these three taxa is therefore not surprising. It is possible that this character is shared among all of the Brachyura, and could therefore serve as a marker for membership in this group and might aid in rapid identification of unidentified individuals, such as larvae or processed materials in markets.
The effective number of codons used in a gene, N c , is a statistic developed by Wright [30] to quantify how far codon usage in a gene departs from the equal use of all synonymous codons. The value of N c can range from 20, the theoretical extreme in which only one codon is used for each amino acid, to 61 when the use of all synonymous codons is equally likely. This statistic, initially developed to compare codon usage between different genes in the same genome, can also be used to compare codon usage between genomes. I calculated N c for each of the nine malacostracan mitochondrial genomes using the program CodonW [31]. Table 4 shows N c and GC content for majority strand, minority strand, and all protein genes in the malacostracan mitochondrial genomes. GC rather than AT content is presented to conform to the convention for these comparisons. The N c values, which range from 38 to 53, are all below the value of 61 that indicates random codon usage, so codon usage in all nine genomes is non-random. There are no obvious similarities in the values for related taxa (i.e., the decapods), but extensive additional sampling among the Malacostraca would be necessary to confirm this observation.
In Figure 3 N c and GC values are plotted. The distribution of the points suggests a linear relationship between N c and GC content. A similar association of N c and GC content was observed by Negrisolo et al. [32]. Equations represent-Mitochondrial gene orders for nine malacostracan crustaceans Figure 2 Mitochondrial gene orders for nine malacostracan crustaceans. The style of the figure is adapted from Figure 1 of Lavrov et al. [25]. Protein and ribosomal RNA genes (large boxes) are abbreviated as in the text. Transfer RNA genes are abbreviated with single letter codes (see Figure 1). The striped box represents the AT-rich region. The ancestral pancrustacean gene order is found for five of the nine taxa, including Squilla mantis. The position of AT-rich region 1 in the S. mantis genome is noted with an arrow. Genes are transcribed from right to left except when underlined. Shaded boxes indicate genes whose positions differ from their positions in the ancestral pancrustacean sequence. The number in parentheses next to taxa names represents the minimum number of rearrangement events that separates that gene arrangement from the ancestral pancrustacean gene order (see Miller et al. [20] for a fuller discussion of the rearrangements in C. destructor). ing a least squares linear regression analysis are shown for all three data sets in Table 2. Only the regression line for the all genes data set is shown on Figure 3 to prevent clutter. These equations are not statistically probative, but the distribution of points around the line shown in Figure  3 does add to the qualitative impression of a relationship between N c and GC content. I also calculated the coefficient of determination (R 2 ) for each data set. The values for the majority and minority strand columns are near 0.5, suggesting that around 50% of the variation in one variable is associated with the other. That is, if GC content is taken as independent then 50% of the codon bias in the majority and minority strands is due to the influence of the bias towards low GC values. When both data sets are combined R 2 rises to 0.91, suggesting that codon bias and GC content are very closely associated. This discordance between R 2 for each strand separately and R 2 for all protein-coding genes is puzzling and merits additional study.
However, if one assumes that GC content is driven by other biochemical factors then it is clear that much, if not most, of the codon bias observed in these mitochondrial genomes is a consequence of this nucleotide bias.

Conclusion
This is the first formal description of the mitochondrial genome of a stomatopod crustacean. This genome maintains the same genes and gene order that are inferred as ancestral in the Pancrustacea, but does contain one unusual feature: a 230 base pair AT-rich region between the trnS 2 and nad1 genes. This feature has no discernable function, but it may prove useful as a character in understanding the evolutionary history of the genus Squilla. Three other arthropod mitochondrial genomes have two large non-coding regions; the ostracod crustacean Vargula hilgendorfii, a millipede Thyropygus sp., and the tick Boophilus microplus [15,33,34]. Of these the latter two are Relationship between GC content and effective number of codons for mitochondrial protein-coding genes Figure 3 Relationship between GC content and effective number of codons for mitochondrial protein-coding genes. The line represents the least squares linear regression calculated for the all genes data set. This equation is shown in Table 4. A comparison of nine malacostracan genomes, including that of S. mantis, shows that all nine exhibit the nucleotide composition bias favoring A and T nucleotides that is commonly observed for arthropod genomes, and that this bias is responsible at least in part for the observed codon usage bias in these genomes. However, there are no observable patterns of nucleotide composition bias or codon usage bias that unite particular taxa into common groups. These nine taxa represent only a small fraction of the diversity of the Malacostraca, and additional sequencing from across the diversity of this taxon would provide additional data for understanding the evolution of mitochondrial genomes of the class.

Samples and DNA extraction
A single freshly caught specimen of S. mantis was purchased from the fish market at Heraklion, Crete, Greece. Six grams of abdominal muscle were dissected from the specimen and immediately frozen at -70°C. Approximately one half gram of the frozen tissue was shaved from the specimen using a sterile razor blade and genomic DNA extracted using a QIAGEN genomic-tip 20 and the associated buffer set according to the manufacturer's protocol.

PCR, sequencing, and annotation
Short fragments (300-1000 nucleotides) of the mitochondrial genome were amplified at low stringency (50-55 degree annealing temperatures) using primers designed to work on all arthropods (Table 4). Amplification products were cloned into the T-Easy vector (Promega) and at least three clones from each PCR product were sequenced with vector-specific primers using ABI Big-Dye chemistry. Squilla-specific primers were designed and used to amplify longer fragments of 1000-4000 nucleotides that spanned the gaps between the short fragments. Longer fragments were also ligated into the T-Easy vector and at least three clones from each ligation were isolated. Each clone was sequenced using a primer-walking strategy initiated with vector-specific primers. Sequences were assembled using Sequencher v. 3.1 (GeneCodes Corp.). Protein-coding genes were identified using BLAST searches [35] and by comparison with other arthropod mitochondrial genome sequences. Transfer RNA genes were identified using tRNAscan-SE [36]. Transfer RNA sequences were folded by eye, but made use of the tRNAscan-SE server output when that was available. The effective number of codons, N c was calculated using the software package CodonW [31].