The complete mitochondrial genome of the sea spider Nymphon gracile (Arthropoda: Pycnogonida)

Background Mitochondrial genomes form units of genetic information replicating indepentently from nuclear genomes. Sequence data (most often from protein-coding genes) and other features (gene order, RNA secondary structure) of mitochondrial genomes are often used in phylogenetic studies of metazoan animals from population to phylum level. Pycnogonids are primarily marine arthropods, often considered closely related to chelicerates (spiders, scorpions and allies). However, due to their aberrant morphology and to controversial results from molecular studies, their phylogenetic position is still under debate. Results This is the first report of a complete mitochondrial genome sequence from a sea spider (Nymphon gracile, class Pycnogonida). Gene order derives from that of other arthropods so that presumably 10 single tRNA gene translocations, a translocation of the mitochondrial control region, and one large inversion affecting protein-coding genes must have happened in the lineage leading to Nymphon gracile. Some of the changes in gene order seem not to be common to all pycnogonids, as those were not found in a partial mitochondrial genome of another species, Endeis spinosa. Four transfer RNAs of Nymphon gracile show derivations from the usual cloverleaf secondary structure (truncation or loss of an arm). Initial phylogenetic analyses using mitochondrial protein-coding gene sequences placed Pycnogonida as sister group to Acari. However, this is in contrast to the majority of all other studies using nuclear genes and/or morphology and was not recovered in a second analysis where two long-branching acarid species were omitted. Conclusion Extensive gene rearrangement characterizes the mitochondrial genome of Nymphon gracile. At least some of the events leading to this derived gene order happened after the split of pycnogonid subtaxa. Nucleotide and amino acid frequencies show strong differences between chelicerate taxa, presumably biasing phylogenetic analyses. Thus the affinities between Pycnogonida and Acari (mites and ticks), as found in phylogenetic analyses using mitochondrial genes, may rather be due to long-branch attraction and independently derived nucleotide composition and amino acid frequency, than to a real sister group relationship.


Background
Due to their evolutionary history as derived endosymbionts, mitochondria have retained genetic material -the mitochondrial genome. Much of their original gene content was eliminated or transferred to the nucleus [1], while only a small proportion of genes has persisted to the present. In triploblastic animals the circular mitochondrial genome is sized around 11-20 kilobases and contains typically 37 genes: 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes [2]. Mitochondrial genomes serve as a simple model for modes and mechanisms of gene rearrangements and genome evolution and provide large datasets for phylogenetic analyses. The frequent use of mitochondrial genes for inferring phylogenetic relationships of animals is due to their universal distribution among taxa, strongly conserved regions in some genes (facilitating universal PCR primer sets) and the absence of paralog genes [3]. However, the incidental presence of nuclear copies of mitochondrial genes [4] and strong differences in nucleotide composition between taxa [5] may complicate phylogenetic analyses.
During the last ten years mitochondrial genome data have played an important role in redefining arthropod relationships. The position of mitochondrial trnL2 is changed in crustaceans and hexapods, but not in chelicerates and myriapods [6]. Also from sequence-based analyses of mitochondrial [7,8] and nuclear genes [9][10][11] the Pancrustacea hypothesis found strong support, while the traditional Tracheata hypothesis, mainly based on morphological data, is now widely rejected. Mitochondrial genome data also provided strong evidence towards the identification of the formerly enigmatic Pentastomida (tongue worms) as aberrant crustaceans [12]. While those hypotheses were collectively supported by nuclear and mitochondrial data, some other hypotheses obtained with mitochondrial genome data are highly disputed, as for example the polyphyly of hexapods [7,8] or the phylogenetic position of pycnogonids [13].
Pycnogonids or sea spiders are among the most bizarre arthropods, some of them with very large legs attached on a tiny body. Food uptake is performed by a pharyngeal suction tube, some species have even lost all head appendages (chelifores and pedipalps). Due to their derived morphology their phylogenetic position remains uncertain, although most workers consider them as primarily aquatic chelicerates [14]. Recent phylogenetic analyses using a combination of molecular and morphological data [11], or nuclear genes [9,10] support a basal position among chelicerates. In contrast, sequence data from partial mitochondrial genomes suggest an affin ity to Acari (mites and ticks) [13], thus implying a terrestrial origin of pycnogonids. Recently, neuroanatomical data suggest that pycnognid chelifores are not positionally homologous to cheliceres [15], thus questioning pycnogonid affinities to Euchelicerata. However, hox gene expression data do not support the this view [16]. We report here the first complete mitochondrial genome sequence for a member of the Pycnogonida,Nymphon gracile. We use these data to analyse chelicerate relationships and to evaluate hypotheses of the phylogenetic position of Pycnogonida. We also discuss ancestral and derived features of the mitochondrial genome of Nymphon gracile and the influence of AT-content and differences of amino acid frequencies on phylogenetic analyses.

Mitochondrial genome organization
The mitochondrial genome of Nymphon gracile is a circular DNA molecule of 14,681 bp length [Gen-Bank:DQ666063]. All 37 genes expected for animal mitochondrial genomes have been identified. Gene overlaps (7 bp) exist between nad4 and nad4L, as well as between atp8 and atp6, as is reported for many other mitochondrial genomes. Six out of thirteen protein-coding genes show incomplete stop codons (T or TA), which is probably compensated by posttranscriptional polyadenylation [17].
Gene order (Fig. 1, Tab. 1) of the mitochondrial genome differs in many positions from that of the horseshoe crab Limulus polyphemus, which is considered to represent the euarthropod ground pattern [18]. One large segment (about 3,500 bp) containing the protein-coding genes cox1, cox2, nad2 and the transfer RNA genes trnC, trnW, trnI were probably subject to an inversion. Maintaining their original gene order, these genes were found on the opposite strand in Nymphon gracile compared to other arthropods. Probably trnK and trnD were also involved in the same inversion, meanwhile separated from cox2 by six tRNA genes, which have different positions in other arthropods. Further translocation events of single tRNA genes may have led to their actual position in Nymphon gracile. Three tRNAs were lost from the inverted segment (trnQ, trnM, trnY) and translocated to other positions in the mitochondrial genome. Two tRNA genes (trnE and trnR) have interchanged their positions, making it impossible to decide which of them was translocated and which stayed in its position. Supposing that trnK and trnD have changed their position due to the large inversion mentioned above, altogether ten tRNA genes have undergone individual translocation events. Except from those involved in the inversion, no other protein-coding or rRNA gene has changed its position compared to the ground pattern of Euarthropoda, represented by Limulus polyphemus [18]. The two trnL genes lie adjacent to another between nad1 and rrnL, as expected for the euarthropod ground pattern. This differs from the derived condition in Hexapoda and Crustacea (trnL2 between cox1 and cox2, trnL1 between nad1 and rrnL).
A large non-coding region is present between nad3-trnE-trnR and trnY-trnF-nad5. This is very likely to be the mitochondrial control region, which therefore must have been translocated, too. As the strand bias of nucleotide frequencies is comparable to other arthropods (see below) the control region seems not to be inverted as it is assumed for scorpions and two web spider species [13].
A partial (5105 bp) mitochondrial genome of another species of Pycnogonida, Endeis spinosa [Gen-Bank:AY731173], was published recently [5,13] and revealed no differences in gene order to Limulus polyphemus in the segment ranging from nad2 to cox3. Therefore we presume that the large inversion recorded in Nymphon gracile must have happend after the split between the two clades. Assuming a larger taxon sampling the derived gene order of Nymphon gracile may serve as an apomorphic character supporting a subtaxon of Pycnogonida. Six of the ten individually translocated tRNA genes of Nymphon gracile found their new position between trnK and cox2. In Endeis spinosa these two genes are adjacent, therefore these six tRNA translocations may also have happened after the split of the two clades.

Secondary structure of RNAs
Out of 22 transfer RNAs usually present in metazoan mitochondrial genomes we have identified 18 by tRNAscan [19]. The remaining four show derivations from the typical cloverleaf structure: the DHU stem and loop is extremely short or missing in trnA, trnN and trnS1, while it is definitely missing in trnR (putative secondary structures are shown in Fig. 2). Such aberrant secondary structures are often found in metazoan mitochondrial tRNAs, some taxa even show derivations in the majority of their tRNA genes (e.g. some nematodes [20], or web spiders [21,22]). It is not clear if the function of such derived tRNAs is maintained in every case, as there are reports of recruitment of nuclear tRNAs into mitochondria [1]. In spiders however, it is very likely that tRNAs are functional, despite lacking the TΨC stem and loop [23]. trnS1 from the horseshoe crab Limulus polyphemus is also missing the complete DHU stem and loop, while all other tRNAs in this species could be folded into cloverleaf structures [24].

Nucleotide composition
Nucleotide compositions of protein-coding and ribosomal RNA genes clearly demonstrate a strand specific bias (Tab. 2). CG-skew is positive for all genes on (+)-strand and negative for genes on (-)-strand, AT-skew is positive or only slightly negative in (+)-strand genes, while strongly negative in (-)-strand genes. This is also true for genes which have changed from one strand to the other due to an inversion during pycnogonid evolution (cox1, cox2, nad2). Strand bias in nucleotide composition is Gene order in pycnogonids (Nymphon gracile, this study; Endeis spinosa, according to [13]) Figure 1 Gene order in pycnogonids (Nymphon gracile, this study; Endeis spinosa, according to [13]). Comparison to the ground pattern of Euarthropoda (here represented by Limulus polyphemus and Heptathela hangzhouensis). Asterisks indicate that a gene is located on the opposite strand. Transfer RNA genes are depicted by their corresponding one-letter amino acid code. Colored genes have derived relative positions in the derived gene order of Nymphon gracile. Lines refer to putative independent translocation events.
probably due to asymmetries during the replication of mitochondrial genomes, leading to different mutational pressures on both strands. In the literature the exact mechanisms are controversially discussed: one strand stays single-stranded during replication in the stranddisplacement model [25], or is subject to extensive incorporation of ribonucleotides in the strand-coupled model [26]. Almost all arthropods show this strand bias [5,13]. In some species a reversal is seen, probably due to a strand swap of the mitochondrial control region (e.g. in scorpions and web spiders [5], as well as in an isopod [27]). As seen from the pycnogonid and spider examples, strand reversal of single genes leads to a quick reversal of strand bias, too. In performing phylogenetic analyses one has to take into account these findings and probably has to modify evolutionary models [5].

Phylogenetic analysis
Phylogenetic analysis using nucleotide sequences from all mitochondrial protein-coding genes (Fig. 3) reveals a strong support for a taxon comprised of Nymphon gracile and Acari (mites and ticks) from BI but not from ML bootstrap analysis. Similar results were published based on an analysis of five mitochondrial protein-coding genes [13]. This contradicts almost every other phylogenetic study which included pycnogonids (the only recent exception not based on mitochondrial genes is a combined analysis of 18S/28S sequences and morphological data, including  Putative secondary structures of mitochondrial tRNA molecules from Nymphon gracile Figure 2 Putative secondary structures of mitochondrial tRNA molecules from Nymphon gracile. fossils [28], where Pycnogonida, Acari and Palpigradi together form one clade -but with extreme character conflict). Recent multigene analyses place pycnogonids outside Arachnida: either (1) as sister group to Euchelicerata (Xiphosura + Arachnida), e.g. with a combined alignment of EF-1α, EF-2, and RNA-Pol II [10], as well as with a combined dataset of nine genes and morphology [11]; or (2) in an unresolved trichotomy with Euchelicerata and Myriapoda (together forming the clade Paradoxopoda or Myriochelata), e.g. using a combined aligment of 18S and 28S [9].
As a consequence of an Acari-Pycnogonid clade one would have to assume that pycnogonids may either have had terrestrial ancestors, or that an independent transition to terrestrial life was undertaken by Acari on the one hand, and spiders and scorpions on the other hand (leaving the question open where to place the remaining arachnid taxa). At least the first hypothesis is contradicted by the fossil record: the oldest pycnogonid fossil is from the upper Cambrian, a time from which no terrestrial animal is known [29] -the first terrestrial arachnids were not found before the Silurian -a gap of about 70 million years.
To determine effects of long-branch attraction [30], a second analysis was performed, on a dataset where sequences from the long-branching acarids Varroa destructor and Leptotrombidium pallidum were omitted (Fig. 4). In the resulting tree Pycnogonida does not cluster with Acari, rather appears as sister taxon to Euchelicerata in the ML tree, but without good support from BI or ML bootstrap analysis.
Chelicerata, Euchelicerata and Arachnida as well find no good support from these inferences. So it is very likely that long-branch attraction is one major reason for the clustering of Pycnogonida and Acari in phylogenetic analysis of chelicerate relationships with mitochondrial genes, while the true position of Pycnogonida remains far from being resolved by our analyses.

AT content and amino acid usage in chelicerate mitochondrial genomes
As mentioned above, some of the arachnid taxa show a reversal in nucleotide frequency bias (the scorpion Centruroides limpidus, the spiders Ornithoctonus huwena and Habronattus oregonensis, and the mite Varroa jacobsohni), which may be one reason for misleading results in a phylogenetic analysis [5,13]. However, it was shown before that a reversal in strand bias seems not to be the cause for the affinity between Acari and Pycnogonida [13].
With the exception of Limulus polyphemus all chelicerates in our phylogenetic analysis show branch lengths two or three times longer than those of hexapods, crustaceans and myriapods (Fig. 3, Fig. 4). This implies that arachnid and pycnogonid species have undergone more change in nucleotide sequence than Limulus polyphemus and the remainder of arthropod taxa in our study. With the exception of the scorpion Centruroides limpidus all arachnid and pycnogonid species in our study show a higher AT content in protein-coding genes than Limulus polyphemus and the outgroup taxa (Tab. 3). This is more striking if only third codon positions are compared. Strong variation in AT content between species may also lead to perturbance of AT skew ((A%-T%)/(A%+T%)) and CG skew ((C%-G%)/(C%+G%)) according to [44]. Values from (-)strand genes in bold letters.
Maximum likelihood tree of chelicerate relationships -complete dataset  Bold numbers indicate strong differences (+/-25%) to Limulus polyphemus (underlined). This species was chosen for comparison, because in the phylogenetic analyses (Fig. 3, 4) it shows a branch length comparable to those of crustacean, myriapod and hexapod species, while the remainder of chelicerates was subject to a higher degree of nucleotide substitution. See methods for complete names of species.
phylogenetic analyses [31]. Independent evolution of higher AT content my lead to homoplastic similarities.
Looking at the amino acid composition of the proteincoding genes from chelicerates and other arthropods (Tab. 3), again strong differences are observed between Limulus polyphemus and the remainder of Chelicerata. Thus the noticed derivations in amino acid usage seem to be directly linked to AT-content: the higher the proportion of adenine and thymine, the stronger the differences in amino acid usage. And in fact, for some amino acids Nymphon gracile together with some Acari (the taxa showing the highest AT contents as mentioned above) show the strongest differences to Limulus polyphemus (Ala, Gly, Phe, Trp). This fact may have further promoted the clustering of Acari and Nymphon gracile in our first phylogenetic analysis (Fig. 3) and in [13]. In contrast, a web spider (Habronattus oregonensis) which also shows high AT content does not cluster with Acari and Nymphon gracile, probably due to a balancing effect of the other two web spider species, which show a comparably moderate AT content.

Conclusion
Ten individually translocated tRNA genes, a large inversion of a segment covering three protein-coding genes and five tRNA genes, and translocation of the control region lead to a derived gene order in Nymphon gracile compared to other arthropods, including another species of Pycnogonida (Endeis spinosa). If sequence data from more pycnogonid species becomes available, gene translocations may serve as phylogenetic markers, which probably resolve relationships between pycnogonid subtaxa. Phylogenetic analysis of chelicerate relationships using mitochondrial protein-coding genes supported a clade consisting of Pycnogonida and Acari. These results contradict other analyses performed with nuclear genes or mor-phological characters. Omiting some of the longbranching acarid species from the analysis led to a tree with unresolved relationships between Myriapoda, Pycnogonida, Xiphosura and two arachnid clades. We hypothesize that phylogenetic analyses of chelicerate interrelationships based on mitochondrial protein-coding genes is biased by three misleading factors: (a) longbranch attraction, (b) derived AT bias in all chelicerate taxa except Limulus polyphemus, and (c) reversed strand bias in Scorpiones, two species of Araneae and the mite Varroa destructor. Thus from a mitogenomic point of view the exact phylogenetic position of Pycnogonida remains an open question, but a sister group relationship between Acari and Pycnogonida as suggested by Hassanin [13] is rather caused by long-branch attraction and higher AT content than to an underlying phylogenetic signal.

Samples and DNA extraction
Specimens of Nymphon gracile were sampled at Concarneau (France) and immediately preserved in pure ethanol (99.8%). DNA from the legs of a single specimen was extracted using the DNeasy Kit (Qiagen, Germany) following the manufacturers protocol.
Successful amplification of PCR products was obtained with the following primer pairs (annealing temperature and approximate length in brackets): Ng-12sr: In addition for amplification of the part spanning from nad4 to rrnl a primer pair already published in [33] was used successfully used: N4: 5'-GGAGCTTCAA CATGAGCTTT-3'/16S2: 5'-GCGACCTCGA TGTTGGATTA A-3' (50°C, 3900 bp). Sequencing of PCR products larger than 1000 bp was done using a primer walking strategy.
Primary analysis of nucleotide sequences was done using the Beckman CEQ 8000 software. Sequences were then aligned and assembled using Bioedit [34]. Protein-coding genes and ribosomal RNA genes were identified by blast-ing on NCBI entrez databases and by comparison with other arthropod mitochondrial genomes. Transfer RNA genes were identified using tRNAscan-SE 1.21 [19] and DOGMA [35].

Phylogenetic analysis
Phylogenetic analysis was performed using nucleotide sequences from mitchondrial protein-coding genes. Amino acid sequences from single genes were aligned by Clustal X [36] with default settings. After retranslation to nucleotides, ambigously aligned parts were omitted from the analysis by making use of Gblocks 0.91b [37], using the "codons" option and default block parameters. Due to the results of a saturation analysis [38] on single codon positions, implemented in DAMBE, version 4.2.13 [39], third codon positions were eliminated from the alignment. The final alignment consisted of 5,711 bp for 20 taxa. A second alignment was obtained by the same procedure but omitting the two species of Acari with longest branches (Varroa destructor and Leptotrombidium pallidum), leading to an alignment consisting of 5,996 bp for 18 taxa.
Two different analyses were performed on both alignments. (1) A maximum likelihood tree was computed using PAUP* ver. 4.0b10 [40]. The model (GTR+I+gamma) and model parameters were chosen according to the AIC with modeltest 3.7 [41]. In addition 100 bootstrap replicates were performed. (2) Bayesian analysis was performed with MrBayes 3.1.2 [42]. 1,000,000 generations were run under the GTR model, with gamma distribution and a proportion of invariant sites. The first 100 out of 1000 trees were discarded as burn-in. Mitochondrial genome data from other species than Nymphon gracile was obtained from the OGRE database [43].